Definition: Given an alphabet S, one instance of Post's correspondence problem of size s is a finite set of pairs of strings (gi , hi) ( i = 1...s s>=1) over the alphabet S. A solution of length n >= 1 to this instance is a sequence i1 i2 ... in of selections such that the strings gi1gi2 ... gin and hi1hi2 ... hin formed by concatenation are identical.*

Basically, you've got two sets of strings composed from the same alphabet. You have to build a new string that is some arrangement of strings taken from the first set. This string must also match a string that can be built from the strings available in the second set. The only other qualification is that the list of index numbers for the strings used from the first set must be the same as the list of index numbers of the strings used from the second set. The new string you're forming can be arbitrarily long.

It's been shown that this problem is recursive but not recursively enumerable (recursive in the sense of being decidable, not solvable by recursion in programming). That means that you can recognize instances of the PCP that are solvable, but you can't build a program that will look at an arbitrary instance of PCP and decide whether there's a solution. This puts it in the same class of problems as the halting problem. If you could solve arbitrary instances of PCP, you could solve arbitrary instances of the halting problem (which you certainly can't).

Also note that the acronym is PCP, also the acronym used for the dissociative anaesthetic (and drug of abuse) phencyclidine. Imagine the world of puns you can create from this:

"How's the problem set coming?"
"I just wish I didn't have to do all this goddamn PCP."

Yeah, sorry, that was pretty lame. Couldn't resist.

*Text of definition shamelessly copied from http://www.cs.ualberta.ca/~zhao/PCP/intro.htm

Emil Post's work on undecidability is often overshadowed by the contributions of his more famous contemporary, Alan Turing. The undecidable nature of the halting problem has an elegant proof itself, but emulating Turing machines in other problems to prove their undecidability is a delicate and messy business. Post's Correspondence Problem, introduced in 1946, is a more natural fit for this role, and since it can itself mimic the halting problem, is of equal power in proving undecidability. Thus it found application to formal language theory; it can also be elegantly applied to the analysis of communication protocols, which I wish to cover here.

Post's Correspondence Problem

Firstly, here's an easy way to understand Post's Correspondence Problem. Imagine a particularly fiendish variant of Scrabble, whereby the tiles are split, like dominoes, into two halves, each bearing a string of letters. The question is, can we line up some (non-zero) number of tiles so that the top and botttom halves both spell out the same 'word' (simply a string of letters)? The types of tiles are fixed, but you may use as many of each type as you wish.

For instance, if our tile set is the following:

   _______      _______      _______
  |       |    |       |    |       |
  |   a   |    | abaaa |    |   ab  |
  |   -   |    |   -   |    |   -   |
  |  aaa  |    |   ab  |    |   b   |
  |_______|    |_______|    |_______|      

     (1)          (2)          (3)

Then the arrangement (2)(1)(1)(3) gives the 'word' abaaa a a ab = abaaaaaab on the top, and ab aaa aaa b = abaaaaaab on the bottom: that is, we have a match.

As observed above, the general form of this innocent-looking problem is undecidable: no program can take an arbitrary set of tiles Ai= gi/hi and decide whether there are matching sequences g(i1)...g(in)=g=h=h(i1)...h(in).

The Secrecy problem

The secrecy problem is the obvious question in the analysis of cryptographic protocols: given a protocol and some intended secret S, can an intruder intercept S? Answering this question depends of course on the powers attributed to the intruder, and the restrictions placed on the protocol. But I'll demonstrate here how even a fairly restricted definition of protocol - exploiting neither unbounded parallel sessions nor unbounded numbers of nonces (in fact, no nonces will be required at all!) - reduces to PCP. This means of course that secrecy is undecidable, else we'd be able to determine the answer to arbitrary PCP instances by encoding them into secrecy problems and answering those.

Reduction of Secrecy to PCP

The trick to proving undecidability of secrecy is to convert the correspondence problem into a communication protocol, in such a way that the secret is disclosed if and only if the encoded PCP has a solution. We give each tile to a participant; the holder of tile i will be called Appenderi. There will also be two other legitimate participants, the Initiator and the compromiser. The Appenders, Initiator and compromiser are all assumed to share a secret key which is unknown to any intruder.

To start a run of the protocol, the Initiator encrypts a pair of messages, both consisting of the empty string, into a packet and sends it to any other participant. Any Appender receiving a packet decrpyts it, and changes the strings by appending the top half of their tile to the first string and the bottom half to the second string. They then pair these two new strings together, encrpyt into a packet, and send to anyone other than the initiator. In this way, candidates for matching strings are generated, under encryption.

Should the compromiser receive a packet containing two identical non-empty strings, they send back the secret message S "in the clear" - that is, without any encryption. So an intruder will discover the secret iff the compromiser receives a solution to the PCP problem, and thus only iff such a solution exists.

For the PCP instance above, the secret would be disclosed as follows. Firstly, the initiator sends a pair of empty strings to Appender2. He examines his tile, and creates a pair (abaaa,ab) which is sent (encrypted) to Appender1. She adds the contents of her tile, producing (abaaaa,abaaa) then sends this to herself for another round, creating (abaaaaa,abaaaaaa). This time she sends it to Appender3, who generates (abaaaaaab,abaaaaaab) and forwards this to the compromiser. He discovers the two strings match, so sends S back to Appender3, which, bereft of encryption, is then known to any eavesdropper.

We can formalise all this as follows, firstly fixing some notation: Let the ith tile have gi written on the top half, and hi written on the bottom half: an empty string of no letters will be denoted by e, an arbitrary string by a capital letter, an arbitrary letter by s and the concatenation of strings simply by writing them next to each other. Let K be a key shared between each of the Appenderis, the initiator and the compromiser; P be any participant other than the initiator, and S the secret. This gives roles for the legitimate participants as follows:

Initiator role
Init-->P: {(e,e)}K
Appender roles (one for each tile)
(P or Init)-->Appenderi: {(X,Y)}K
Appenderi-->P: {(Xgi,Yhi)}K
Compromiser role
P-->Comp: {(Xs,Xs)}K
Comp-->P: S

Which would give a secret-disclosing run of our example as follows:


Init-->Appender2: {(e,e)}K
Appender2-->Appender1: {(abaaa,ab)}K
Appender1-->Appender1: {(abaaaa,abaaa)}K
Appender1-->Appender3: {(abaaaaa,abaaaaaa)}K
Appender3-->Comp: {(abaaaaaab,abaaaaaab)}K
Comp-->Appender3: S

Thus, since such a run is one of the many possibilities for this protocol, secrecy is not guaranteed. Since the existence of the compromising run depends on the existence of a PCP solution, the secrecy problem is hence undecidable.

Criticisms and variations

The analysis of cryptographic protocols is made difficult by various sources of infinity- a priori, the number of sessions, the number of nonces, and the length of messages could all be unbounded. The example above is particularly devastating since it does not require the first two of these. However, since Post's Correspondence Problem does not restrict the number of tiles used the messages involved are of unbounded length. This, coupled with the 'blind copying' ability such as that of the Appender roles, means we can encode computational models.

However, there is a strong criticism that these protocols are pathological examples: they serves no useful communication purpose, having been crafted with the benefit of an intruder in mind and to prove the existence of a reduction. But practical interest is not in the class of all protocols, but of all that we might wish to use- those with an 'honest run' whereby all the roles introduced are necessary components for some legitimate purpose such as authentication. We cannot exclude blind copying completely, however, since such manipulations are necessary in, for instance, key exchange via a trusted server.

Decidability of secrecy remains an open problem for protocols with honest runs restricted to either bounded message size; or such that participants may make only a single blind copy. Sadly, even with the 'honest run' requirement, unbounded message size gives rise to an undecidable secrecy problem, but there is a decidable case: bounded nonces with participants allowed only a single blind copy.

References

CM30071: Logic and its Applications Undergraduate module, Computer Science Dept., University of Bath.
Formal analysis of Cryptographic Protocols Taught Postgraduate course, Laboratory for Foundations of Computer Science, University of Edinburgh.
Wikipedia: Emil Post, Post's Correspondence Problem.

Log in or register to write something here or to contact authors.