A perl idiom of exceeding power and grace, the Schwartzian transform is a method for sorting a data structure on an arbitrary key efficiently1 and with a minimum amount of perl code.

The classic ST always takes the form of

     map { } sort { } map { }
but, in general, any sequence of cascaded maps is considered to be a Schwartzian transform. The canonical ST2 is
    @new = map { $_->[0] }
        sort { $b->[1] <=> $a->[1]
                       ||
               $a->[2] cmp $b->[2]
        } map { [$_, /=(\d+)/, uc($_)] } @old;

This3 ST sorts a list of files by size

   @sorted_by_size =
       map { $_->[0] }
       sort { $a->[1] <=> $b->[1] }
       map { [$_, -s] }
          @files;
but with only one call to stat(2) per file rather than the n log n calls you'd get from this
  @sorted_by_size = sort { -s $a <=> -s $b } @files;

It is important to note that using a Schwartzian transform doesn't automatically guarantee that your code is going to be fast or efficient. Nothing can help you if you write code like this

map {
  for my $item (@some_big_ass_list) {
    do_something_expensive($_, $item);
  }
} @some_big_ass_list;

In principle, any for() loop can be turned into an ST, but sometimes amazing contortions are required. This example should have been left as a for loop.

my $o;
map {
  $o = shift;
  for (@$_) {
    ($_->{name})=$o->fetch_name();
    ($_->{hist})=$o->fetch_hist();
    prep_final($_);    # one $_
  }
  finish($_);          # a different $_
} map { [ $_, @{$_->fetch_list()}]} @list_of_objects;

1 Where efficiently is defined as 'with no named temps'
2 online documentation for perl 5.6.0
3 From Effective Perl Programming by Joseph Hall

The name given to a programming idiom used when a list of records is to be re-ordered according to a function of the record, rather than the value of the record itself. It is named[1] after Randal Schwartz, who popularized its use in Perl, where it is often implemented as a one-liner, though the concept is language-independent and the first use of it is lost in the shrouded mists of time.

The idea is very simple. Rather than sorting the list of records using a comparison function that computes the function of the two given records and returns the comparison value of the two function values, a new list is created, each element of which is a record consisting of the function's value for the corresponding record in the original list, and that record itself. Thus, an original list

R1, R2, R3, ...
is converted to
(f(R1), R1), (f(R2), R2), (f(R3), R3), ...
That list is sorted, and then converted again, removing the first item from each element, which produces the desired ordered list.

Note that the point of this exercise is to compute the function's value for each record only once; the simplistic approach of computing the value in the comparison function can compute the same function for the same record many times, depending on the size of the list, the sorting algorithm used, and the implementation thereof. While this is done in order to speed up the entire sorting process, and it generally will, it is possible for this sorting to take longer because of the fact that a bigger list is being sorted than in the simplistic case, which may raise swapping issues, though that would be a pathological case.

Also, when used in interpreted languages such as Perl or Python, another source of speedup is that the supplied sort function is generally faster (much faster in the case of Python) if it is called without a user-defined comparison function.

Here is a simple Python function which sorts a list using a Schwartzian Transform.

def STsort(list, f):
    """
    STsort(list, f) --> list
    Sorts the given list using a Schwartzian Transform with function f
    """

    list = [ (f(item), item) for item in list ]
    list.sort()
    return [ item[1] for item in list ]

This could be used, for example, to sort a list of names of people according to how old they are, by looking up their age in a database (a relatively slow operation that you wouldn't want to repeat).

   People = ['Nolan', 'Clarence', 'Rebecca']
   def Age(Person):
       return DBLookup(Person).age
   ByAge = STsort(People, Age)

which would return ['Rebecca', 'Nolan', 'Clarence']. It's just as easy to sort according to the number of vowels in the name:

   def nVowels(Name):
       i = 0
       for c in Name: i += c in 'aeiou'
       return i
    ByVowels = STsort(People, nVowels)

which would return ['Nolan', 'Clarence', 'Rebecca'].

Update, 1/16/2005

With the advent of Python version 2.4, released November, 2004, the machinery of the Schwartzian Transform is built in to the sort method of list objects. It can, of course, be used just as before by either specifying no argument for a sort using native comparisons of the list elements, or by specifying a comparison function to use instead. That argument is now formally called cmp, because other optional arguments have been added.

The one that concerns us here is key. This argument receives a function that takes one argument, a value from the list, and returns the sort key to be associated with it. Before sorting the list, key is called for each element, and the list of those keys is sorted instead; then the list is re-ordered according to the sorted keys.

Thus, to use the above examples, one can get the ST effect simply with People.sort(key=Age).

There is one difference to note however: the standard list sort method, since version 2.3, guarantees a stable sort (which means that two elements of the list having the same key value will appear in the same order relative to each other after sorting as before). In the STsort function above, we see that the key and the element are treated together as one element of the list that was actually sorted, which meant (according to Python's standard tuple comparison rules) the original value acted as a tie-breaker among elements with the same key value. The difference can be seen here:

    People = ['Rebecca', 'Clarence', 'Nolan']
    # The nVowels values are 3, 3, and 2 respectively
    print STsort(People, nVowels)
    #   (prints ['Nolan', 'Clarence', 'Rebecca'])
    People.sort(key=nVowels)
    print People
    #   (prints ['Nolan', 'Rebecca', 'Clarence'])


[1] According to m_turner, this construct was christened by Tom Christiansen, in a Usenet conversation in which he derided it.

Log in or register to write something here or to contact authors.