There's more than one way to skin a cat, or so they say. And most languages, whether computer or natural, will let you express the same thing in more than one way. HTML, though barely a language so much as an encoding, is no exception to this, and allows any old bit of text to be represented in more than one way, though it'll still render identically.

Apart from HTML's behaviour of simply removing whitespace (allowing endless permutations of source code with variable numbers or spaces and line breaks), there's redundancy in the representation of characters. Specifically, to render a normal character using HTML, either the character can be represented in plain ASCII (or, to be picky, ISO 8859), or using the numerical HTML entity for that character's ASCII code. For example, 'A' can be produced by either A or A.

Obviously, it's better to use the plain ASCII, for at least two reasons. Firstly, you stand a chance of actually being able to read the source of your HTML, and also it's much more compact. Plain ASCII takes one byte per character; an HTML entity takes up five or six.

The practical upshot of all this? Well, not much really. Here on E2, node titles are all stored textually in the HTML source. A node title containing entities is not the same as one containing the equivalent ASCII characters. So here, purely in the interests of science, is a small perl script which writes out all of the 5242880 different possible HTML representations of Butterfinger McFlurry.

#!/usr/bin/perl

sub p {
  my ($prefix, $chars) = @_;
  if (length($chars) == 0) {
    # Nothing left to be processed => done!
    print "$prefix\n";
    return;
  }

  # Split first character off of right-hand string.
  my ($c, $cs) = (substr($chars, 0, 1), substr($chars, 1));

  # Alternatives for firsts character: plain ASCII or HTML entity.
  my @alternatives = ($c, "&#".ord($c).";");

  # Space can also be non-breaking space, or any of it's variations.
  if ($c eq ' ') {
    push @alternatives, (chr(160), " ", " ");
  }
  # Produce complete variations for each alternative.
  foreach (@alternatives) {
    p($prefix.$_, $cs);
  }

}

p("", "Butterfinger McFlurry");

Not that this is of any practical value, of course. Unless some nefarious individual wanted to create five mllion completely different 'Butterfinger McFlurry' nodeshells. Which would be silly. And annoying. Umm... but fun.


Colin will not be held responsible for any use of this script for purposes other than those for which it was intended. Umm. Or any purpose whatsoever, actually.

Log in or register to write something here or to contact authors.