A more rigorous representation of protein topology
than a TOPS cartoon
. Instead of a fully planar graph
, the diagram is linear:
__H:P__ / \
/ \ / \
/ \/ ____ __\___ ____
/\ /\ / \ \ / / \ __
|\ | __/ \_______/ \______/ \____\ /_____/ \____ /
| \| / \ / \ \ / \ / \ / \__
/______\ /______\ \____/ \/ \____/
...and the labelled edges are hydrogen bond
s (H:P or H:A for parallel or anti-parallel) and chiral
ities (C:R and C:L for left and right). The backbone is the straight line throught the center - obviously, some information has been lost.
Indeed, diagrams are not much use as a visualisation tool (which is what the cartoons are for). Instead, they are a type of graph (an ordered, directed one). This means they are amenable to subgraph isomorphism matching and therefore machine learning techniques. A generalisation of the TOPS diagram is the pattern.
An even more simplified representation is the string form, where strands are 'E' or 'e' (up and down) and helices are 'H' and 'h'. Edges in the graph are denoted by integer pairs, separated with a ':' and terminated by a label. For the example graph shown above, this is:
Example NEEHehC 1:2P2:5A3:4R
While this representation has the disadvantage of being even more unreadable than the linear graph, it is compact. This means that storage of tens of thousands of protein secondary structures in one file is possible.