### Regular Expressions:

This

node serves as an introduction to

regular expressions.

It covers concatenation, the union operator

**'|'** and star

**'*'**
These are the only

necessary operators.

Other Operators simply make the expression more simple, but do not add to the expressive

power of the expressions themselves.

Consider a

language to be a

set of

elements.

An

*expression* is used to describe the elements of the set without explicitly

listing them.

Consider a

*regular* language as a set of

strings
A

*regular expression* is used to describe all strings in the set without explicitly listing every possible one.

##### Examples:

{"Do"} is a language, or a set containing one element. That element is
the string, "Do".

A

suitable expression for this language would be:

**Do**
{"Do","Die"} is another language. It consists of two strings, "Do" and
"Die".

A suitable expression for this language would be:

**Do|Die**
This says that if a string is made up of either the

sequence of

chars "D
o" or the sequence of chars "Die", then it is in the language .

{"Do","Die","Dig"} similarly can be expressed by:

**Do|Die|Dig**
Now consider the language, {"Dos","Dies","Digs"}

In the same manner as above, we can describe this language with the expression:

**Dos|Dies|Digs**
Notice that there is a

common char 's' at the end of each string. By

concatenation, this can be distributed over the expression:

**(Dos|Dies|Digs) ** is equivalent to

**(Do|Die|Dig)s**
Also, notice that there is a common char 'D' at the beginning of each string. This can be similarly

distributed over the expression:

**(Do|Die|Dig)s** is equivalent to

**D(o|ie|ig)s**
This latest expression says that every string in the language begins with a D, which is followed by one of ("o","ie","ig") and ends with an s.
If a string does not

follow this

form, it is not in the language described by the expression.

Notice that the expression:

**D(o|ie|ig)s** is

*also* equivalent to

**D(o|(i(e|g)))s**
All of the languages mentioned so far have a finite number of elements.

Regular expressions can also describe languages that are made up of an infinite number of elements.

Consider the language,

{"Dos","Doso",Dosoo","Dosoo",...}

**U**
{"Dies","Dieso","Diesoo","Diesooo",...}

**U**
{"Digs","Digso","Digsoo","Digsooo",...}

This contains all elements described by the regular expression

**
D(o|ie|g)s** as well as all strings that start with one of these members, followed by any number of the letter "o".

A regular expression that describes all strings consisting of zero or more "o"'s would be:

**o***
This describes the set {

*e*,"o","oo","ooo",...}

We can express the concatenation of this set with the last set with the
regular expression:

**D(o|ie|g)so***
##### More regular expressions:

One digit, 0 Through 9:

**(0|1|2|3|4|5 ... |9)** or (0-9)

At least one digit:

**(0-9)(0-9)***
All decimals:

**(0-9)(0-9)*("."|***e*)(0-9)*
A regular expresion over {0,1} where "0" is the third digit from the last:

**(0|1)*0(0|1)(0|1)**
A regular expresion over {0,1} where "0" is the

*fourth* digit from the last:

**(0|1)*0(0|1)(0|1)(0|1)**
A string consisting of an even number of ones and zeros

**(***e*|01|10|00|11)* *or* ((0|1)(0|1))*