Custom Search

Thursday, June 05, 2008

Intelligent design research: Using Chinese anagrams to model proteins

The Biologic Institute (an ID research center) has a paper published in PLOS One, an on-line peer-reviewed journal: Stylus: A System for Evolutionary Experimentation Based on a Protein/Proteome Model with Non-Arbitrary Functional Constraints, by Douglas D. Axe*, Brendan W. Dixon, and Philip Lu. Biologic Institute, Redmond, Washington, United States of America

It's a computer program that offers Chinese anagrams to try to model the awful complexity of proteins. From the paper:

Although the Han characters all originally functioned as stand-alone words, the number of concepts needing words has increased dramatically since the character set became effectively fixed. Instead of inventing new characters, the solution was to combine existing characters to form multi-character words, which are now common. These words are like multi-protein complexes in that their function requires correct arrangement of two or more parts. However, while protein complexes are compound structures, multi-character words are separate structures arranged sequentially. The next section explains how this is implemented in the vector world and considers the implications for functional constraints.

High-level functions: From sentences to texts, and operons to proteomes. In both biology and language, the jump from elementary function to useful function brings with it a new level of complexity. Words are elementary semantic units, in that meanings are attached to symbols starting at the word level. But language only becomes useful for communication when word-level meanings are combined to convey more complex meanings. Similarly, although proteins and protein complexes perform low-level functions of biological relevance, organismal capabilities—from survival-enhancing phenotypes all the way up to survival itself—require the coordinated combination of many such functions. Ultimately whole proteomes are coordinated in this way.

I was going to try to copy some of the illustrations but the best I can do for now is link you to the slide show.

Here's a comment from a mailng from the Discovery Institute:
Chinese writing, in particular, employs structural characters that are analogous in some interesting ways to protein structures. Like folded proteins, these written characters perform the low level functions from which higher functions can be achieved.

Stylus builds on this analogy by using a life-like genetic code to specify simple building blocks (twenty vectors, analogous to the twenty amino acids) that in turn build Chinese characters. In this way, gene sequences looking very much like biological sequences encode vector chains with two-dimensional shapes. If those chains have the right geometry, by conforming to the shape of a character, they provide basic semantic function. And the functional hierarchy builds up from there.

The result is an artificial genetic system where genes encode basic functions by means of appropriate structures, and genomes encode higher functions that employ these basic functions. So, if it can be written in Chinese, it can be encoded in a genome and represented in working form by a proteome.

The big question, of course, is whether Darwinian evolution can do anything interesting in a system like this. In view of its similarity to life, the answer would be hard to ignore either way.

And the truth is—we don’t know the answer. Yet.

Guess someone will have to lean on PLOS One to take this paper down. Meanwhile, here's the abstract:

The study of protein evolution is complicated by the vast size of protein sequence space, the huge number of possible protein folds, and the extraordinary complexity of the causal relationships between protein sequence, structure, and function. Much simpler model constructs may therefore provide an attractive complement to experimental studies in this area. Lattice models, which have long been useful in studies of protein folding, have found increasing use here. However, while these models incorporate actual sequences and structures (albeit non-biological ones), they incorporate no actual functions—relying instead on largely arbitrary structural criteria as a proxy for function. In view of the central importance of function to evolution, and the impossibility of incorporating real functional constraints without real function, it is important that protein-like models be developed around real structure–function relationships. Here we describe such a model and introduce open-source software that implements it. The model is based on the structure–function relationship in written language, where structures are two-dimensional ink paths and functions are the meanings that result when these paths form legible characters. To capture something like the hierarchical complexity of protein structure, we use the traditional characters of Chinese origin. Twenty coplanar vectors, encoded by base triplets, act like amino acids in building the character forms. This vector-world model captures many aspects of real proteins, including life-size sequences, a life-size structural repertoire, a realistic genetic code, secondary, tertiary, and quaternary structure, structural domains and motifs, operon-like genetic structures, and layered functional complexity up to a level resembling bacterial genomes and proteomes. Stylus is a full-featured implementation of the vector world for Unix systems. To demonstrate the utility of Stylus, we generated a sample set of homologous vector proteins by evolving successive lines from a single starting gene. These homologues show sequence and structure divergence resembling those of natural homologues in many respects, suggesting that the system may be sufficiently life-like for informative comparison to biology.


Who links to me?