David Weininger's
chemical shorthand that, while not very useful for chemists is very useful for computers and chemical
databases. It can (with very few exceptions)
encode any chemical, along with its
charge and
stereochemistry. For example:
- Benzene: c1ccccc1
H H
\ /
C1---C1
// \\
H -- C C -- H
\ /
C == C
/ \
H H
- Phenol: c1ccccc1(O)
H OH
\ /
C1---C1
// \\
H -- C C -- H
\ /
C == C
/ \
H H
- p-Cresol: c1cc(C)ccc1(O)
H OH
\ /
C1---C1
// \\
H -- C C -- H
\ /
C == C
/ \
H3C H
- Cyclobutadiene: C1=CC=C1
H H
\ /
C1 = C
| |
C1 = C
/ \
H H
Where c is an aromatic carbon, C an aliphatic and the numbers and the brackets refer to ring closure and branching, respectively. Double bonds are equals signs and triple bonds are # symbols.
Chirality is more difficult, but C=O(O)[C@@H](C)N is L-alanine. With the @ symbols indicating chirality : @ = anticlockwise, @@ = clockwise (apparently, @ looks like an anti-clockwise spiral, and @@ is anti-anti-clockwise :)
The process of converting a molecule into a smile string is probably best done by software. However, conceptually it is very simple - convert the strucure into a linear chain by cutting rings and labelling the cut endpoints with number subscripts. If an atom was part of two cut edges (eg : tetrahedrane, which is fully 3D) then it gets two subscripts (like C13 - which means "1 and 3" not "13"). Non-covalent bonds can be specified, although I don't think that hydrogen bonds can.