A given subalphabet is evaluated by measuring how well the structure can be predicted from the recoded version of the original sequence.
Interestingly, below 13 clusters, the cost of merging two groups increases quite abruptly at certain subalphabet sizes.
When the sequence-specific performance for the optimal subalphabet of 13 groups (measured by GC) was plotted against the amino acid distribution entropy, [[SIGMA].sub.l] [p.sub.l] lo[g.sub.2]([p.sub.l]), it was clearly demonstrated that lower-complexity sequences with bias in the composition were not easier to predict (data not shown).
Incorporating sequence correlations into a protein subalphabet evaluation is impossible when the approach is based on a simple matrix.
In this article, we do not want to cover all attempts to represent protein sequences computationally but restrict the review to recent developments in the area of amino acid subalphabets, where the idea is to discover groups of amino acids that can be lumped together, thus giving rise to alphabets with fewer than 20 symbols.
Thus, we have subalphabets with less than 20 symbols.
The most common classification is that of [alpha]-helix, [beta]-sheet, and coil, which is the structural level we use here in the search for novel amino acid subalphabets.
Merged amino acid subalphabets are of significant interest both in the context of evolution and in protein-structure prediction.
In this article, we introduce a new computational approach for evaluating subalphabets by searching directly for sequence reencodings that improve protein secondary-structure prediction.
In the paper by Wang and Wang (1999), the search for, and ranking of, subalphabets was based on the 190 amino acid substitution scores in the Miyazawa and Jernigan (1996) matrix (MJ matrix).
Using this approach, we generate good subalphabets through an iterative, one-path reduction, where we examine the prediction quality by successively merging two groups (which initially consist of individual amino acids) but keeping the best-scoring groupings joined.