There's more than one way to skin a cat, or so they say. And most
languages, whether computer or natural, will let
you express the same thing in more than one way. HTML, though barely
a language so much as an encoding, is no exception to this, and allows
any old bit of text to be represented in more than one way, though it'll
still render identically.
Apart from HTML's behaviour of simply removing whitespace
(allowing endless permutations of source code with variable numbers
or spaces and line breaks), there's redundancy in the representation
of characters. Specifically, to render a normal character using HTML,
either the character can be represented in plain ASCII (or, to be picky,
ISO 8859), or using the numerical HTML entity for that character's
ASCII code. For example, 'A' can be produced by either A
or A.
Obviously, it's better to use the plain ASCII, for at least two
reasons. Firstly, you stand a chance of actually being able to read
the source of your HTML, and also it's much more compact. Plain ASCII
takes one byte per character; an HTML entity takes up five or six.
The practical upshot of all this? Well, not much really. Here on E2,
node titles are all stored textually in the HTML source. A node title
containing entities is not the same as one containing the equivalent
ASCII characters. So here, purely in the interests of science, is a small
perl script which writes out all of the 5242880 different possible HTML
representations of Butterfinger McFlurry.
#!/usr/bin/perl
sub p {
my ($prefix, $chars) = @_;
if (length($chars) == 0) {
# Nothing left to be processed => done!
print "$prefix\n";
return;
}
# Split first character off of right-hand string.
my ($c, $cs) = (substr($chars, 0, 1), substr($chars, 1));
# Alternatives for firsts character: plain ASCII or HTML entity.
my @alternatives = ($c, "&#".ord($c).";");
# Space can also be non-breaking space, or any of it's variations.
if ($c eq ' ') {
push @alternatives, (chr(160), " ", " ");
}
# Produce complete variations for each alternative.
foreach (@alternatives) {
p($prefix.$_, $cs);
}
}
p("", "Butterfinger McFlurry");
Not that this is of any practical value, of course. Unless
some nefarious individual wanted to create five mllion completely
different 'Butterfinger McFlurry' nodeshells. Which would be silly. And
annoying. Umm... but fun.
Colin will not be held responsible for any use of this
script for purposes other than those for which it was intended. Umm. Or
any purpose whatsoever, actually.