Although sorting in alphabetical order may seem at first glance to be too simple and unambiguous to worry about, there are in fact some issues which need to be considered in practical applications.

  • Spaces and non-alphabetical characters: In a list of items where some or all have more than one word, or where there are punctuation marks or hyphenated words, three approaches are possible:
    1. The space or punctuation mark is ignored and the sort is based on the concatenated letters: "wall - wallaby - wallet - wall-flower - wall game". This is relatively easy to implement, but can result in the separation of closely related items (e.g. related words in a dictionary) by unrelated ones as above. Dictionaries sometimes compromise by using typographic devices to group related headwords where they happen to be consecutive while sticking with a "strict" alphabetical order.
    2. Spaces or hyphens can be treated as a grouping device, so that all strings starting with a particular discrete word are grouped before those starting with a subsequent word. This is the most appropriate method for lists of names; in the phone book: "Smith Z." will be listed before "Smithers A." It is, of course, then still to be decided how you will handle "Smith-Fitzgerald M."
    3. The space or punctuation is treated identically to other characters, with a sort order of its own. This is normal in computer string handling (because it does not involve any exceptional processing); if the space/punctuation is given a value lower than A (as it does in ASCII) then this will give a similar result to system 2. It may however produce undesirable results: in a list of names, O'Grady will counter-intuitively come before Oates.
  • Common name elements: When sorting by surname, it is standard practice in some fields (e.g. shelving books by author in libraries) to group all variations of the Scots Mc-/Mac- prefix together, since they are easily confused: thus McFlurry will be after McDonald, MacDonald, and Macdonald but before Mackintosh, McKintosh and so on. They may be listed together, either before all other M-names or after Mab-, or else grouped together but intermingled with names which start with the letters Mac- but are not etymologically related, e.g. Machin, Macy.
  • Letter groups treated as single letters: Not an issue in English, but in some languages letter groups which denote a distinct phoneme, including Spanish CH and LL and Welsh CH, DD, LL, and RH, are treated as separate letters. The traditional sorting method in Spanish put CH between C and D, and LL between L and M; this is now gradually falling out of fashion and the "new sort" uses standard one-character alphabetical order.
  • Diacritics: A few languages order letters with accents or other diacritics separately, including Finnish, where ä and ö come after Z at the end of the alphabet (making dictionaries surprisingly awkward to a novice foreigner) and the Spanish traditional sort where Ñ comes after N; others treat accented characters as identical to the vanilla version. Accents also pose an issue for sorting in many computer languages, since basic string comparison operators may work purely on the basis of character order in ASCII, and accented characters tend to be given arbitrary codes out of sequence. Programmers may well need to investigate locale handling routines to resolve this.
  • Case sensitivity: Only really an issue with character-code based computational sorting.
  • Transliteration: When words are transliterated from different writing systems the same word may come in various forms - see the issues with the pinyin and Wade-Giles systems for Chinese that have brought us Peking/Beijing, or the vagaries of transmission of culture that have separated Chekhov and Tchaikovsky to the English, although both names start with the same letter (Ч, cha) in Russian. Clearly, it may well be desirable for different versions of a word or name to be grouped together under some circumstances.

The surgeon-general has determined that taking any sort of job in a library is likely to leave you warped for life.