Puneet Varma (Editor)

Conserved sequence

Updated on
Edit
Like
Comment
Share on FacebookTweet on TwitterShare on LinkedInShare on Reddit
Conserved sequence

In evolutionary biology, conserved sequences are similar or identical sequences in nucleic acids (DNA and RNA), proteins, or polysaccharides across species (orthologous sequences) or within different molecules produced by the same organism (paralogous sequences). Conservation across species indicates that a sequence has been maintained by evolution despite speciation. A highly conserved sequence is one that has remained unchanged far back up the phylogenetic tree, and hence far back in geological time. For example, the homeobox sequences have been conserved across different phyla including the arthropods (such as fruit flies) and vertebrates (such as mice and humans), so these sequences have remained little changed since the Cambrian explosion of animal body plans some 500 million years ago. Parts of the 16S and 23S ribosomal RNA genes have been identified as the most conserved DNA sequences across the domains of life. Highly conserved regions typically indicates that natural selection has continually eliminated forms with mutations in that sequence.

Contents

Nucleic acid and protein sequences

Highly conserved DNA sequences are thought to have functional value. The role for many of these highly conserved non-coding DNA sequences is not understood. Ultra-conserved elements or sequences (UCEs or UCRs, ultra-conserved regions) that share 100% identity among human, mouse and rat were first described by Bejerano and colleagues in 2004. One recent study that eliminated four highly conserved non-coding DNA sequences in mice yielded viable mice with no significant phenotypic differences; the authors described their findings as "unexpected". Many regions of the DNA, including highly conserved DNA sequences, consist of repeated sequence elements. One possible explanation of the null hypothesis above is that removal of only one or a subset of a repeated sequence could theoretically preserve phenotypic functioning on the assumption that one such sequence is sufficient and the repetitions are superfluous to essential life processes; it was not specified in the paper whether the eliminated sequences were repeated sequences. Although most of the conserved sequences' biological function is still unknown, few conserved sequences derived transcripts showed that their expression is deregulated in human cancer tissues.

A common notation to denote the level of sequence conservation is used by the clustal alignment programs. Below a set of aligned sequences, residue columns are indicated as fully conserved (*), containing only conservative mutations (:), semi-conservative mutations (.), and non-conservative mutations ( ).

CpG Islands

Cytosine-guanine dinucleotides (CpG sites) are present at high frequency in CpG islands within the promoter regions of about 70% of human genes. CpG sites are subject to methylation on their cytosine. If a substantial proportion of CpG sites in a CpG island within a promoter of a gene are methylated, this silences expression of that gene.

In the mammalian germ line DNA, demethylation occurs immediately following fertilization in the zygote, so that relatively few genes are silenced at that time. During development of the embryo, however, methylation of CpG islands re-occurs, shutting down patterns of genes in cells located in different areas of the embryo, thus causing tissue differentiation.

Although CpG sites are frequent within CpG islands of promoter regions, CpG sites also occur in other regions of the genome, including within non-coding introns and non-coding three prime untranslated regions of genes. The mutation frequency of CpG sites within CpG islands of gene promoters was compared to the mutation frequency of CpG sites in non-coding regions of genes. The mutation frequencies were determined by comparing gene sequences in homologous genes in chimpanzees and humans. It was found that the CpG sites in CpG islands in promoters were mutated at a substantially lower rate than CpG sites in the non-coding regions of genes. Thus CpG sites within gene promoters tend to be conserved in evolution.

GERP Scores

A GERP (Genomic Evolutionary Rate Profiling) score measures evolutionary conservation of genetic sequences across species. There is a relationship between a sequence's GERP score and the proportion of variant alleles within that sequence. As the GERP score of a sequence increases, variation within that sequence becomes more rare. A higher GERP signifies a highly conserved sequence, where alteration is harmful, so adverse variants would reduce the fitness of the organism and be selected against.

Biological role

Sequences are only likely to be highly conserved through geological time if they are required for basic cellular functions (such as coding for vital enzymes), stability, embryonic development, reproduction. Sequence similarity is used as evidence of structural and functional conservation, and evolutionary relationships between sequences. Consequently, functional elements are frequently identified by searching for conserved sequences in a genome.

Conservation of protein-coding sequences leads to the presence of identical amino acid residues at analogous regions of the protein structure and hence similar function. Conservative mutations alter amino acids to similar chemically residues and so may still not affect the protein's function. Among the most highly conserved sequences are the active sites of enzymes and the binding sites of protein receptors.

Conserved non-coding sequences do not encode protein, but often harbour cis-regulatory elements, including the evo-devo gene toolkit. Some deletions of highly conserved sequences in humans (hCONDELs) and other organisms have been suggested to be a potential cause of the anatomical and behavioural differences between humans and other mammals. The TATA promoter sequence is an example of a highly conserved DNA sequence found in most eukaryotes.

Polymeric carbohydrate sequences

The monosaccharide sequence of the glycosaminoglycan heparin is conserved across a wide range of species.

Applications

The research of conserved genetic sequences is extremely beneficial to the scientific community. The detection of similar sequences across diverse species’ genomes can provide useful information regarding the evolutionary history of these species. Additionally, the examination of conserved sequences can aid medical research. By identifying rare alleles within conserved sequences, information can be compiled and used to assess risk of disease among humans. Genome-wide association studies (GWAS) compare various alleles across the human genome and their association with risk for a particular diseases or ailments.

References

Conserved sequence Wikipedia