The BEDROCK Project
National Science Foundation
Howard Hughes Medical Institute
Thammasat University

Genetic Codes as Codes: Towards a Theoretical Basis for Bioinformatics

Bioinformatics has developed primarily as a discipline within mathematics and computer science devoted to organizing and analyzing large biological databases. However, biology has much to offer to a synthetic discipline of bioinformatics that draws upon and respects the mutual contributions of biology, mathematics and computer science. In particular, biology has two major theoretical foundations, both evolutionary: namely, phylogenetic systematics and population genetics, that can serve as a cornerstone of a theoretical foundation of bioinformatics along with traditional empirically driven, pattern searching forms of classical bioinformatics. In this re-conception of bioinformatics, mathematics and computer science are instrumental in developing biological theory and in solving practical biological problems. Since the genetic code is both an evolutionary product as well as a process for mediating the conversion of genotype to phenotype, it is argued here that an evolutionary analysis of genetic codes will fundamentally affect our ability to make meaning out of molecular messages through a theoretically grounded bioinformatics.

Mathematical properties of genetic codes will be demonstrated with respect to their rates of transmission, correctability and detectability of errors, efficiencies, symmetries, and origins by employing coding theory (Baudot codes, Gray codes, Hamming codes, Huffman codes, common free codes, etc.), abstract algebra, graph theory, combinatorics, information theory, and phylogenetic systematics of sequences. Genetic codes become much more understandable and elegant to biologists, mathematicians, and computer scientists when they are not considered as mere ciphers, but are instead understood from three perspectives: codes per se , physical chemical interactions, and evolutionary selective pressures. These various faces of genetic codes are useful for making meaning out of molecular messages, applying causal mechanisms to complex patterns, and the efficient storage and retrieval of large complex data sets. In addition, some of the alternative distance metrics based upon different mathematical representations of genetic codes that have utility in genomic data base searching (comparative sequence analyses), phylogenetic tree construction, and prediction of three dimensional structure from primary structure will be illustrated and different evolutionary mechanisms affecting gene expression based upon codon usage will be considered.

Web site tools: