The problem
Given a set of cities, and a cost to travel between each pair of cities,
determine whether there is a path that visits every city once
and returns to the first city, such that the cost travelled is less than
some number C.
This problem is often stated such that we need to find the minimum such
C possible, but many common formulations of NP-completeness require the
problems to be decision problems. In general, if an efficient solution
exists for a decision problem, an efficient solution also exists for the
corresponding optimization problem, and decision problems are a little
bit easier to analyze.
The problem is also often stated such that we need to find the path rather than find the existence of the path. In practice, it's likely that an algorithm that solved this problem would answer yes by actually finding a path, but it's not strictly necessary for the the problem to be considered NP-complete.
I'd like to thank jam-master jim for pointing out to me that I made this mistake. Interestingly, this doesn't really change the problem into an optimization problem, since the optimization version of the problem above is to find the minimum C that works. Still, if I wanted to actually use that version of the problem to talk about NP-completeness, I'd have to use a more complicated (but equivalent) definition of NP-complete than is generally used, and it's better to just use the same definition everyone else in the planet agrees on.
NP-Completeness
This problem turns out to be NP-complete, by a polynomial time
transformation from the Hamiltonian Circuit (HC) problem. It's a
very interesting result, because it's actually slightly stronger than
just a result about the NP-completeness of TSP, it also suggests that
finding a general polynomial time approximation algorithm for TSP
implies that P=NP.
The transformation is pretty simple. In HC, we're given a graph.
To transform it into an instance of TSP, let each vertex in the graph be a city.
The cost of traveling between two cities will be 1 if there's an edge
between those cities in the graph. If no such edge exists, the cost to travel
will be m=2n (where n is the number of cities). The question: does there
exist a traveling salesman tour with cost n?
If there's a hamiltonian circuit in the original graph, there will be at
least one traveling salesman path with cost n (the hamiltonian circuit
is one such path.)
Inapproximability of the General Problem
I'll prove that there's no polynomial time approximation algorithm
possible (unless P=NP) by
contradiction. Assume there's an approximation algorithm for TSP. The
best possible approximation algorithm would find a solution with cost at
no worse than one more than the optimal solution. In the instance of
TSP we've just created, this would be asking if there's a traveling
salesman path with cost n+1. Any such path turns out to be the path of
cost n (the possible costs of paths in this graph are
k+m(n-k), where k
is the number of edges that cost 1 in the path. The only way this
number can be less than or equal to n+1 is if
k=n (otherwise n-k is an
integer greater than 0, so the path costs at least m, which is greater
than n+1.)
Therefore, if we find an approximation algorithm for TSP, we've found a
polynomial time algorithm for solving HC, and P=NP. (It's possible to
construct a similar proof about approximation algorithms that find an
answer no worse than k*optimal, for any value of k. This is left as an
exercise for the reader.)
Approximability of a Special Case
Wait a minute... I just said it's impossible to find an approximation
algorithm, and now I'm about to go back on myself. It turns out that if
you add more constraints to the problem, it gets easier. For example,
if we assume that the triangle inequality holds for the cost function,
it's possible to find an algorithm that can find an answer that's less
than twice the optimal answer.
For the triangle inequality to hold, that means that if we have three
cities a, b, and c, the following equation is true:
cost(a,c) ≤ cost(a,b) + cost(b,c)
In particular, this is true if the costs are the distances between
cities (just an interesting fact about
Euclidean geometry...) It's
also not true about the graphs constructed in the example above, so it's
possible that those results don't apply...
It turns out that there are polynomial time algorithms to come up with
minimum spanning trees. A minimum spanning tree spans the whole
graph (i.e. it contains every vertex in the graph). A traveling
salesman circuit can't possibly cost less than the minimum spanning tree
on a graph (it has to contain every node, too.)
The algorithm to find a good traveling salesman path is as follows:
start with the minimum spanning tree. Start at the root of the tree,
and do a depth-first traversal of the tree. If we just left it here,
we wouldn't have a possible solution to TSP, because we use each edge in
the tree twice. Nevertheless, if this was a good solution, it would
cost at most twice the optimal cost (since we're using edges twice, and
the cost of these edges is the minimal amount for any tree that connects
all vertices of the graph.)
So any time we're about to repeat visiting a node in
our traversal, we skip that node, and go to the next node we would have
visited. Because of the triangle inequality, the distance we travel by
skipping nodes in a path must be less than taking all the nodes. So we
can keep taking these shortcuts, and end up with a hamiltonian circuit
that costs less than or equal to twice the optimal possible path.
Better approximation algorithms exist, but they're more complicated.
Many real-world examples where solving TSP would be nice satisfy the
triangle inequality, so this is a useful problem to look at.
Unfortunately, many real-world problems don't satisfy the triangle
inequality. As one example, it's sometimes possible to fly from city a
to city b and then on to city c cheaper than it would have been possible
to fly directly from a to c.
References: