The oft-heard lament of those using wildcard expansion on a shell, when they suddenly discover that while the shell's wildcards may be easy to type, they make doing some things impossible which would be easy with real regular expressions.

For example, all of these are easy:

  • "*.txt" matches everything ending in ".txt"; this is the same as the regular expression (or regexp) "\.txt$".
  • "readme*.txt" matches everything beginning "readme" and ending ".txt", including "readme.txt" (if such exists); this is the same as the regexp "^readme.*\.txt$".
  • "*.[0-9][0-9][0-9]" matches everything with a 3-digit extension at the end; this is the same as the regexp "\.[0-9]{3}$" (match anything ending in a "." followed by 3 matches of a digit 0-9).

But all of these are hard (and, if we try to match strings of unbounded length, impossible):

  • "^ab*c$" matches all strings starting with an a, ending with c, and only b's in between.
  • "^readme[1-9][0-9]*\.txt$" matches everything of the form "readme17.txt", where "17" may be substituted by any number; "^readme[0-9]+\.txt" would also allow these numbers to start with the digit "0".
  • "^[A-Z]+[a-z][0-9]$" matches everything consisting of uppercase letters followed by a lowercase letter followed by a digit.

An earlier stage: utter confusion from users who, familiar with wildcards, aka filename globbing, encounter regular expressions for the first time and don't realize * and ? are operators rather than wildcards.

This is one of the things that makes regular expressions 'hard': it's unexpected. But it's not the notation that's complex, it's what you can express with it: the complexity of regular expressions is appropriate.

Log in or register to write something here or to contact authors.