Chemical biology

Updated on Apr 25, 2026

Edit

Comment

Chemical biology is a scientific discipline spanning the fields of chemistry, biology, and physics. It involves the application of chemical techniques, tools, and analyses, and often compounds produced through synthetic chemistry, to the study and manipulation of biological systems. Chemical biologists attempt to use chemical principles to modulate systems to either investigate the underlying biology or create new function. Research done by chemical biologists is often closer related to that of cell biology than biochemistry. Biochemists study the chemistry of biomolecules and regulation of biochemical pathways within cells and tissues, e.g. cAMP or cGMP, while chemical biologists deal with novel chemical compounds applied to biology.

Introduction

Some forms of chemical biology attempt to answer biological questions by directly probing living systems at the chemical level. In contrast to research using biochemistry, genetics, or molecular biology, where mutagenesis can provide a new version of the organism or cell of interest, chemical biology studies probe systems in vitro and in vivo with small molecules that have been designed for a specific purpose or identified on the basis of biochemical or cell-based screening.

Chemical biology is one of many interfacial sciences that are characteristic of a general trend away from older, reductionist fields toward those whose goals are to achieve a description of scientific holism. In this sense, it is related to other fields such as proteomics. Chemical biology has scientific, historical and philosophical roots in medicinal chemistry, supramolecular chemistry (particularly host-guest chemistry), bioorganic chemistry, pharmacology, genetics, biochemistry, and metabolic engineering.

Proteomics

Proteomics investigates the proteome, the set of expressed proteins at a given time under defined conditions. As a discipline, proteomics has moved past rapid protein identification and has developed into a biological assay for quantitative analysis of complex protein samples by comparing protein changes in differently perturbed systems. Current goals in proteomics include determining protein sequences, abundance and any post-translational modifications. Also of interest are protein–protein interactions, cellular distribution of proteins and understanding protein activity. Another important aspect of proteomics is the advancement of technology to achieve these goals.

Protein levels, modifications, locations, and interactions are complex and dynamic properties. With this complexity in mind, experiments need to be carefully designed to answer specific questions especially in the face of the massive amounts of data that are generated by these analyses. The most valuable information comes from proteins that are expressed differently in a system being studied. These proteins can be compared relative to each other using quantitative proteomics, which allows a protein to be labeled with a mass tag. Proteomic technologies must be sensitive and robust, it is for these reasons, the mass spectrometer has been the workhorse of protein analysis. The high precision of mass spectrometry can distinguish between closely related species and species of interest can be isolated and fragmented within the instrument. Its applications to protein analysis was only possible in the late 1980s with the development of protein and peptide ionization with minimal fragmentation. These breakthroughs were ESI and MALDI. Mass spectrometry technologies are modular and can be chosen or optimized to the system of interest.

Chemical biologists are poised to impact proteomics through the development of techniques, probes and assays with synthetic chemistry for the characterization of protein samples of high complexity. These approaches include the development of enrichment strategies, chemical affinity tags and probes.

Enrichment techniques

Samples for Proteomics contain a myriad of peptide sequences, the sequence of interest may be highly represented or of low abundance. However, for successful MS analysis the peptide should be enriched within the sample. Reduction of sample complexity is achieved through selective enrichment using affinity chromatography techniques. This involves targeting a peptide with a distinguishing feature like a biotin label or a post translational modification. Interesting methods have been developed that include the use of antibodies, lectins to capture glycoproteins, immobilized metal ions to capture phosphorylated peptides and suicide enzyme substrates to capture specific enzymes. Here, chemical biologists can develop reagents to interact with substrates, specifically and tightly, to profile a targeted functional group on a proteome scale. Development of new enrichment strategies is needed in areas like non-ser/thr/tyr phosphorylation sites and other post translational modifications. Other methods of decomplexing samples relies on upstream chromatographic separations.

Affinity tags

Chemical synthesis of affinity tags has been crucial to the maturation of quantitative proteomics. iTRAQ, Tandem mass tags (TMT) and Isotope-coded affinity tag (ICAT) are protein mass-tags that consist of a covalently attaching group, a mass (isobaric or isotopic) encoded linker and a handle for isolation. Varying mass-tags bind to different proteins as a sort of footprint such that when analyzing cells of differing perturbations, the levels of each protein can be compared relatively after enrichment by the introduced handle. Other methods include SILAC and heavy isotope labeling. These methods have been adapted to identify complexing proteins by labeling a bait protein, pulling it down and analyzing the proteins it has complexed. Another method creates an internal tag by introducing novel amino acids that are genetically encoded in prokaryotic and eukaryotic organisms. These modifications create a new level of control and can facilitate photocrosslinking to probe protein–protein interactions. In addition, keto, acetylene, azide, thioester, boronate, and dehydroalanine- containing amino acids can be used to selectively introduce tags, and novel chemical functional groups into proteins.

Enzyme probes

To investigate enzymatic activity as opposed to total protein, activity-based reagents have been developed to label the enzymatically active form of proteins (see Activity-based proteomics). For example, serine hydrolase- and cysteine protease-inhibitors have been converted to suicide inhibitors. This strategy enhances the ability to selectively analyze low abundance constituents through direct targeting. Structures that mimic these inhibitors could be introduced with modifications that will aid proteomic analysis- like an identification handle or mass tag. Enzyme activity can also be monitored through converted substrate. This strategy relies on using synthetic substrate conjugates that contain moieties that are acted upon by specific enzymes. The product conjugates are then captured by an affinity reagent and analyzed. The measured concentration of product conjugate allow the determination of the enzyme velocity. Identification of enzyme substrates (of which there may be hundreds or thousands, many of which unknown) is a problem of significant difficulty in proteomics and is vital to the understanding of signal transduction pathways in cells; techniques for labelling cellular substrates of enzymes is an area chemical biologists can address. A method that has been developed uses "analog-sensitive" kinases to label substrates using an unnatural ATP analog, facilitating visualization and identification through a unique handle.

Glycobiology

While DNA, RNA and proteins are all encoded at the genetic level, there exists a separate system of trafficked molecules in the cell that are not encoded directly at any direct level: sugars. Thus, glycobiology is an area of dense research for chemical biologists. For instance, live cells can be supplied with synthetic variants of natural sugars in order to probe the function of the sugars in vivo. Carolyn Bertozzi, previously at University of California, Berkeley, has developed a method for site-specifically reacting molecules the surface of cells that have been labeled with synthetic sugars.

Combinatorial chemistry

Chemical biologists used automated synthesis of many diverse compounds in order to experiment with effects of small molecules on biological processes. More specifically, they observe changes in the behaviors of proteins when small molecules bind to them. Such experiments may supposedly lead to discovery of small molecules with antibiotic or chemotherapeutic properties. These approaches are identical to those employed in the discipline of pharmacology.

Molecular sensing

Chemical biologists are also interested in developing new small-molecule and biomolecule-based tools to study biological processes, often by molecular imaging techniques. The field of molecular sensing was popularized by Roger Tsien's work developing calcium-sensing fluorescent compounds as well as pioneering the use of GFP, for which he was awarded the 2008 Nobel Prize in Chemistry. Today, researchers continue to utilize basic chemical principles to develop new compounds for the study of biological metabolites and processes.

siRNA-A tool in chemical biology

siRNA or small interfering RNAs owe their origins to the difficulties the scientific community faced utilizing classical and reverse genetics methods in studying gene expression. Disrupting genes to study their functions is not always optimal; neither is mapping mutations back to their genes easy. The whole process is expensive as well as time-consuming, which is why a lot of effort has been devoted to develop methods to silence gene expression in sequence specific manner using nucleic acids. They have the potential to be powerful tools in the field of chemical biology to study the chemistry of gene expression in therapeutic targets of bacteria and viruses.

A number of different types of nucleic acid molecules have already gained prominence because of their potential as therapeutics. They target mRNAs to silence the genes in a sequence specific manner. Oligodeoxyribonucleic acids, ODNs utilize steric interaction to silence gene expression. They can also form triple helices in conjunction with the DNA duplex. Whereas ribozymes can be chemically designed to target specific genes and cleave them in a sequence specific manner. The most promising of these methods however is utilization of short interfering RNA or siRNA to silence gene expression.

siRNA

siRNA or short interfering RNAs exist in nature as a means for the express purpose of controlling gene expression. It was discovered in petunia as a post-transcriptional gene silencing measure. It is the resultant product when a long double-strand RNA of 20 -25 nucleotides length was processed in the cells by the enzyme DICER. The newly synthesized siRNA assemble into endoribonuclease-containing complexes known as RNA-induced silencing complexes (RISCs), unwinding in the process. The activated RISC then binds to the complementary RNA molecules by base pairing interactions between the siRNA strand and the mRNA, which is then cleaved. This mechanism is known as RNA interference or RNAi.

Designing and synthesizing siRNAs

It is now possible to order siRNAs designed and synthesized with the express purpose of targeting a particular sequence. The ambion website has a lot of information on the optimal design of siRNAs.

siRNAs can be synthesized chemically, or enzymatically. RNase III or DICER can be used to cleave the long dsRNAs to produce siRNAs. However the most expedient method is the use of plasmids to express them in vivo by delivering them into the target cell using vectors. This method allows the siRNAs to be expressed in the target cell stably, over a period of time and overcomes the drawbacks of the transience of their effect. Numerous strategies have been developed in order to deliver the siRNA into the cell efficiently:

Electroporation
Local and systemic injection: This method was the first success scientists had in silencing genes using siRNAs. They were successfully delivered into highly vascularized tissue in mice through using high-pressure tail vein injection. Greater than 90% loss in gene expression was observed in the targets.
siRNA producing viruses: This method shows great promise in gene therapy, and research is progressing in order to generate recombinant viruses that can produce siRNA in target cells.
Small molecules that enhance transdermal penetration: Research in this field is moving at a fast pace in order to synthesize small organic molecules that, if injected in conjunction with siRNAs, can help them penetrate into the target cells.

Biological uses of the RNAi approach

The principal purpose of studying siRNA mediated RNA interference is probably to investigate gene function.

It is so much easier to make genetic knock-outs by simply introducing sequence-specific siRNAs into cells; multi-copy genes can be silenced in one fell swoop by this method. Creation of double-knockout mutants is also easier and consumes much less time. Using local injections in specific regions of the model organisms also help in creating spatially separated and restricted knockout. siRNAs are also being successfully used to screen whole genomes in organisms such as C. elegans and Drosophilla melanogaster. Even in mammalian systems such as Danio rerio (zebrafish) that usually prove intractable to all gene silencing methods, even dsRNA injection, siRNA can do the job. It is paving a new way in development of therapeutics by identifying human gene orthologs in other species in a remarkably short period of time.

Numerous high-throughput screening approaches are being developed to screen large libraries of cells rapidly in order to identify drug targets. A brief description of few of the screening techniques:

Pooled Format Screening: A reagent library of RNAi has to be introduced to the cells so that a particular cell is in one particular reagent. The primary hits are then identified and their identity elucidate by sequencing techniques.
Arrayed Format Screening: Each RNAi reagent is placed in separate wells in a plate and multiple manipulations can be done to identify their targets, which are then detected by fluorescence readouts, imaging techniques and other methods as well. Thus the identity of the target cell can be determined through the identity of the reagent in the database.
Multiplexed methods: A combination of various assays can be used for high-throughput screening of candidate drug targets. For example, candidate genes can be identified through informatics based methods and then screened against a library of reagents. Many other such methods are being developed in order to make the job of screening therapeutic targets easier.

siRNA based therapeutics

The future in this field rests in the development of siRNA-based drugs.

This could prove to be a powerful tool in gene based therapy. Research is now concentrated on developing strategies to design siRNA therapeutics for clinical use. A brief description of some novel strategies for siRNA drug development is provided here:

Direct Mutation Targeting: The siRNAs are designed to perfectly match mutant alleles but contain one or more mismatches with wild-type alleles, leading to specific degradation of the matching, mutant transcripts.
Indirect Mutation Targeting: The siRNA approach will not work if the mutant alleles are too similar to wild type. So an indirect approach is taken in which siRNAs are designed against disease linked markers such as SNP variations. The ones that are screened as positive are targeted for degradation.
Exon-specific targeting: siRNAs are designed to target expressed regions (exons) of the gene.
Targeting exon skipped transcripts: If the problem in the gene lies in aberrant splicing post-transcription, siRNA can be designed to target the unnatural exon-exon interface arising as a result of such alternative splicing.

Employing biology

Many research programs are also focused on employing natural biomolecules to perform a task or act as support for a new chemical method or material. In this regard, researchers have shown that DNA can serve as a template for synthetic chemistry, self-assembling proteins can serve as a structural scaffold for new materials, and RNA can be evolved in vitro to produce new catalytic function.

Protein misfolding and aggregation as a cause of disease

A common form of aggregation is long, ordered spindles called amyloid fibrils that are implicated in Alzheimer’s disease and that have been shown to consist of cross-linked beta sheet regions perpendicular to the backbone of the polypeptide. Another form of aggregation occurs with prion proteins, the glycoproteins found with Creutzfeldt–Jakob disease and bovine spongiform encephalopathy. In both structures, aggregation occurs through hydrophobic interactions and water must be excluded from the binding surface before aggregation can occur. A movie of this process can be seen in "Chemical and Engineering News". The diseases associated with misfolded proteins are life-threatening and extremely debilitating, which makes them an important target for chemical biology research.

Through the transcription and translation process, DNA encodes for specific sequences of amino acids. The resulting polypeptides fold into more complex secondary, tertiary, and quaternary structures to form proteins. Based on both the sequence and the structure, a particular protein is conferred its cellular function. However, sometimes the folding process fails due to mutations in the genetic code and thus the amino acid sequence or due to changes in the cell environment (e.g. pH, temperature, reduction potential, etc.). Misfolding occurs more often in aged individuals or in cells exposed to a high degree of oxidative stress, but a fraction of all proteins misfold at some point even in the healthiest of cells.

Normally when a protein does not fold correctly, molecular chaperones in the cell can encourage refolding back into its active form. When refolding is not an option, the cell can also target the protein for degradation back into its component amino acids via proteolytic, lysosomal, or autophagic mechanisms. However, under certain conditions or with certain mutations, the cells can no longer cope with the misfolded protein(s) and a disease state results. Either the protein has a loss-of-function, such as in cystic fibrosis, in which it loses activity or cannot reach its target, or the protein has a gain-of-function, such as with Alzheimer's disease, in which the protein begins to aggregate causing it to become insoluble and non-functional.

Protein misfolding has previously been studied using both computational approaches as well as in vivo biological assays in model organisms such as Drosophila melanogaster and C. elegans. Computational models use a de novo process to calculate possible protein structures based on input parameters such as amino acid sequence, solvent effects, and mutations. This method has the shortcoming that the cell environment has been drastically simplified, which limits the factors that influence folding and stability. On the other hand, biological assays can be quite complicated to perform in vivo with high-throughput like efficiency and there always remains the question of how well lower organism systems approximate human systems.

Dobson et al. propose combining these two approaches such that computational models based on the organism studies can begin to predict what factors will lead to protein misfolding. Several experiments have already been performed based on this strategy. In experiments on Drosophila, different mutations of beta amyloid peptides were evaluated based on the survival rates of the flies as well as their motile ability. The findings from the study show that the more a protein aggregates, the more detrimental the neurological dysfunction. Further studies using transthyretin, a component of cerebrospinal fluid that binds to beta amyloid peptide deterring aggregation but can itself aggregate especially when mutated, indicate that aggregation prone proteins may not aggregate where they are secreted and rather are deposited in specific organs or tissues based on each mutation. Kelly et al. have shown that the more stable, both kinetically and thermodynamically, a misfolded protein is the more likely the cell is to secrete it from the endoplasmic reticulum rather than targeting the protein for degradation. In addition, the more stress that a cell feels from misfolded proteins the more probable new proteins will misfold. These experiments as well as others having begun to elucidate both the intrinsic and extrinsic causes of misfolding as well as how the cell recognizes if proteins have folded correctly.

As more information is obtained on how the cell copes with misfolded proteins, new therapeutic strategies begin to emerge. An obvious path would be prevention of misfolding. However, if protein misfolding cannot be avoided, perhaps the cell's natural mechanisms for degradation can be bolstered to better deal with the proteins before they begin to aggregate. Before these ideas can be realized, many more experiments need to be done to understand the folding and degradation machinery as well as what factors lead to misfolding. More information about protein misfolding and how it relates to disease can be found in the recently published book by Dobson, Kelly, and Rameriz-Alvarado entitled Protein Misfolding Diseases Current and Emerging Principles and Therapies.

Chemical synthesis of peptides

In contrast to the traditional biotechnological practice of obtaining peptides or proteins by isolation from cellular hosts through cellular protein production, advances in chemical techniques for the synthesis and ligation of peptides has allowed for the total synthesis of some peptides and proteins. Chemical synthesis of proteins is a valuable tool in chemical biology as it allows for the introduction of non-natural amino acids as well as residue specific incorporation of "posttranslational modifications" such as phosphorylation, glycosylation, acetylation, and even ubiquitination. These capabilities are valuable for chemical biologists as non-natural amino acids can be used to probe and alter the functionality of proteins, while post translational modifications are widely known to regulate the structure and activity of proteins. Although strictly biological techniques have been developed to achieve these ends, the chemical synthesis of peptides often has a lower technical and practical barrier to obtaining small amounts of the desired protein. Given the widely recognized importance of proteins as cellular catalysts and recognition elements, the ability to precisely control the composition and connectivity of polypeptides is a valued tool in the chemical biology community and is an area of active research.

While chemists have been making peptides for over 100 years, the ability to efficiently and quickly synthesize short peptides came of age with the development of Bruce Merrifield's solid phase peptide synthesis (SPPS). Prior to the development of SPPS, the concept of step-by-step polymer synthesis on an insoluble support was without chemical precedent. The use of a covalently bound insoluble polymeric support greatly simplified the process of peptide synthesis by reducing purification to a simple "filtration and wash" procedure and facilitated a boom in the field of peptide chemistry. The development and "optimization" of SPPS took peptide synthesis from the hands of the specialized peptide synthesis community and put it into the hands of the broader chemistry, biochemistry, and now chemical biology community. SPPS is still the method of choice for linear synthesis of polypeptides up to 50 residues in length and has been implemented in commercially available automated peptide synthesizers. One inherent shortcoming in any procedure that calls for repeated coupling reactions is the buildup of side products resulting from incomplete couplings and side reactions. This places the upper bound for the synthesis of linear polypeptide lengths at around 50 amino acids, while the "average" protein consists of 250 amino acids. Clearly, there was a need for development of "non-linear" methods to allow synthetic access to the average protein.

Although the shortcomings of linear SPPS were recognized not long after its inception, it took until the early 1990s for effective methodology to be developed to ligate small peptide fragments made by SPPS, into protein sized polypeptide chains (for recent review of peptide ligation strategies, see review by Dawson et al.). The oldest and best developed of these methods is termed native chemical ligation. Native chemical ligation was unveiled in a 1994 paper from the laboratory of Stephen B. H. Kent. Native chemical ligation involves the coupling of a C-terminal thioester and an N-terminal cysteine residue, ultimately resulting in formation of a "native" amide bond. Further refinements in native chemical ligation have allowed for kinetically controlled coupling of multiple peptide fragments, allowing access to moderately sized peptides such as an HIV-protease dimer and human lysozyme. Even with the successes and attractive features of native chemical ligation, there are still some drawbacks in the utilization of this technique. Some of these drawbacks include the installation and preservation of a reactive C-terminal thioester, the requirement of an N-terminal cysteine residue (which is the second-least-common amino acid in proteins), and the requirement for a sterically unincumbering C-terminal residue.

Other strategies that have been used for the ligation of peptide fragments using the acyl transfer chemistry first introduced with native chemical ligation include expressed protein ligation, sulfurization/desulfurization techniques, and use of removable thiol auxiliaries.

Expressed protein ligation allows for the biotechnological installation of a C-terminal thioester using intein biochemistry, thereby allowing the appendage of a synthetic N-terminal peptide to the recombinantly produced C-terminal portion. This technique allows for access to much larger proteins, as only the N-terminal portion of the resulting protein has to be chemically synthesized. Both sulfurization/desulfurization techniques and the use of removable thiol auxiliaries involve the installation of a synthetic thiol moiety to carry out the standard native chemical ligation chemistry, followed by removal of the auxiliary/thiol. These techniques help to overcome the requirement of an N-terminal cysteine needed for standard native chemical ligation, although the steric requirements for the C-terminal residue are still limiting.

A final category of peptide ligation strategies include those methods not based on native chemical ligation type chemistry. Methods that fall in this category include the traceless Staudinger ligation, azide-alkyne dipolar cycloadditions, and imine ligations.

Major contributors in this field today include Stephen B. H. Kent, Philip E. Dawson, and Tom W. Muir, as well as many others involved in methodology development and applications of these strategies to biological problems.

Protein design by directed evolution

One of the primary goals of protein engineering is the design of novel peptides or proteins with a desired structure and chemical activity. Because our knowledge of the relationship between primary sequence, structure, and function of proteins is limited, rational design of new proteins with enzymatic activity is extremely challenging. Directed evolution, repeated cycles of genetic diversification followed by a screening or selection process, can be used to mimic Darwinian evolution in the laboratory to design new proteins with a desired activity.

Several methods exist for creating large libraries of sequence variants. Among the most widely used are subjecting DNA to UV radiation or chemical mutagens, error-prone PCR, degenerate codons, or recombination. Once a large library of variants is created, selection or screening techniques are used to find mutants with a desired attribute. Common selection/screening techniques include fluorescence-activated cell sorting (FACS), mRNA display, phage display, or in vitro compartmentalization. Once useful variants are found, their DNA sequence is amplified and subjected to further rounds of diversification and selection. Since only proteins with the desired activity are selected, multiple rounds of directed evolution lead to proteins with an accumulation beneficial traits.

There are two general strategies for choosing the starting sequence for a directed evolution experiment: de novo design and redesign. In a protein design experiment, an initial sequence is chosen at random and subjected to multiple rounds of directed evolution. For example, this has been employed successfully to create a family of ATP-binding proteins with a new folding pattern not found in nature. Random sequences can also be biased towards specific folds by specifying the characteristics (such as polar vs. nonpolar) but not the specific identity of each amino acid in a sequence. Among other things, this strategy has been used to successfully design four-helix bundle proteins. Because it is often thought that a well-defined structure is required for activity, biasing a designed protein towards adopting a specific folded structure is likely to increase the frequency of desirable variants in constructed libraries.

In a protein redesign experiment, an existing sequence serves as the starting point for directed evolution. In this way, old proteins can be redesigned for increased activity or new functions. Protein redesign has been used for protein simplification, creation of new quaternary structures, and topological redesign of a chorismate mutase. To develop enzymes with new activities, one can take advantage of promiscuous enzymes or enzymes with significant side reactions. In this regard, directed evolution has been used on γ-humulene synthase, an enzyme that creates over 50 different sesquiterpenes, to create enzymes that selectively synthesize individual products. Similarly, completely new functions can be selected for from existing protein scaffolds. In one example of this, an RNA ligase was created from a zinc finger scaffold after 17 rounds of directed evolution. This new enzyme catalyzes a chemical reaction not known to be catalyzed by any natural enzyme.

Computational methods, when combined with experimental approaches, can significantly assist both the design and redesign of new proteins through directed evolution. Computation has been used to design proteins with unnatural folds, such as a right-handed coiled coil. These computational approaches could also be used to redesign proteins to selectively bind specific target molecules. By identifying lead sequences using computational methods, the occurrence of functional proteins in libraries can be dramatically increased before any directed evolution experiments in the laboratory.

Manfred T. Reetz, Frances Arnold, Donald Hilvert, and Jack W. Szostak are significant researchers in this field.

Biocompatible click cycloaddition reactions in chemical biology

Recent advances in technology have allowed scientists to view substructures of cells at levels of unprecedented detail. Unfortunately these "aerial" pictures offer little information about the mechanics of the biological system in question. To be fully effective, precise imaging systems require a complementary technique that better elucidates the machinery of a cell. By attaching tracking devices (optical probes) to biomolecules in vivo, one can learn far more about cell metabolism, molecular transport, cell-cell interactions and many other processes

Bioorthogonal reactions

Successful labeling of a molecule of interest requires specific functionalization of that molecule to react chemospecifically with an optical probe. For a labeling experiment to be considered robust, that functionalization must minimally perturb the system. Unfortunately, these requirements can often be extremely hard to meet. Many of the reactions normally available to organic chemists in the laboratory are unavailable in living systems. Water- and redox- sensitive reactions would not proceed, reagents prone to nucleophilic attack would offer no chemospecificity, and any reactions with large kinetic barriers would not find enough energy in the relatively low-heat environment of a living cell. Thus, chemists have recently developed a panel of bioorthogonal chemistry that proceed chemospecifically, despite the milieu of distracting reactive materials in vivo.

Design of bioorthogonal reagents and bioorthogonal chemical reporters

The coupling of an optical probe to a molecule of interest must occur within a reasonably short time frame; therefore, the kinetics of the coupling reaction should be highly favorable. Click chemistry is well suited to fill this niche, since click reactions are, by definition, rapid, spontaneous, selective, and high-yielding. Unfortunately, the most famous "click reaction," a [3+2] cycloaddition between an azide and an acyclic alkyne, is copper-catalyzed, posing a serious problem for use in vivo due to copper's toxicity.

The issue of copper toxicity can be alleviated using copper-chelating ligands, enabling copper-catalyzed labeling of the surface of live cells.

To bypass the necessity for a catalyst, the lab of Dr. Carolyn Bertozzi introduced inherent strain into the alkyne species by using a cyclic alkyne. In particular, cyclooctyne reacts with azido-molecules with distinctive vigor. Further optimization of the reaction led to the use of difluorinated cyclooctynes (DIFOs), which increased yield and reaction rate. Other coupling partners discovered by separate labs to be analogous to cyclooctynes include trans cyclooctene, norbornene, and a cyclobutene-functionalized molecule.

Use in biological systems

As mentioned above, the use of bioorthogonal reactions to tag biomolecules requires that one half of the reactive "click" pair is installed in the target molecule, while the other is attached to an optical probe. When the probe is added to a biological system, it will selectively conjugate with the target molecule.

The most common method of installing bioorthogonal reactivity into a target biomolecule is through metabolic labeling. Cells are immersed in a medium where access to nutrients is limited to synthetically modified analogues of standard fuels such as sugars. As a consequence, these altered biomolecules are incorporated into the cells in the same manner as their wild-type brethren. The optical probe is then incorporated into the system to image the fate of the altered biomolecules. Other methods of functionalization include enzymatically inserting azides into proteins, and synthesizing phospholipids conjugated to cyclooctynes.

Future directions

As these bioorthogonal reactions are further optimized, they will likely be used for increasingly complex interactions involving multiple different classes of biomolecules. More complex interactions have a smaller margin for error, so increased reaction efficiency is paramount to continued success in optically probing cellular machinery. Also, by minimizing side reactions, the experimental design of a minimally perturbed living system is closer to being realized.

Discovery of biomolecules through metagenomics

The advances in modern sequencing technologies in the late 1990s allowed scientists to investigate DNA of communities of organisms in their natural environments, so-called "eDNA", without culturing individual species in the lab. This metagenomic approach enabled scientists to study a wide selection of organisms that were previously not characterized due in part to an incompetent growth condition. These sources of eDNA include, but are not limited to, soils, ocean, subsurface, hot springs, hydrothermal vents, polar ice caps, hypersaline habitats, and extreme pH environments. Of the many applications of metagenomics, chemical biologists and microbiologists such as Jo Handelsman, Jon Clardy, and Robert M. Goodman who are pioneers of metagenomics, explored metagenomic approaches toward the discovery of biologically active molecules such as antibiotics.

Functional or homology screening strategies have been used to identify genes that produce small bioactive molecules. Functional metagenomic studies are designed to search for specific phenotypes that are associated with molecules with specific characteristics. Homology metagenomic studies, on the other hand, are designed to examine genes to identify conserved sequences that are previously associated with the expression of biologically active molecules.

Functional metagenomic studies enable scientists to discover novel genes that encode biologically active molecules. These assays include top agar overlay assays where antibiotics generate zones of growth inhibition against test microbes, and pH assays that can screen for pH change due to newly synthesized molecules using pH indicator on an agar plate. Substrate-induced gene expression screening (SIGEX), a method to screen for the expression of genes that are induced by chemical compounds, has also been used to search genes with specific functions. These led to the discovery and isolation of several novel proteins and small molecules. For example, the Schipper group identified three eDNA derived AHL lactonases that inhibit biofilm formation of Pseudomonas aeruginosa via functional metagenomic assays. However, these functional screening methods require a good design of probes that detect molecules being synthesized and depend on the ability to express metagenomes in a host organism system.

In contrast, homology metagenomic studies led to a faster discovery of genes that have homologous sequences as the previously known genes that are responsible for the biosynthesis of biologically active molecules. As soon as the genes are sequenced, scientists can compare thousands of bacterial genomes simultaneously. The advantage over functional metagenomic assays is that homology metagenomic studies do not require a host organism system to express the metagenomes, thus this method can potentially save the time spent on analyzing nonfunctional genomes. These also led to the discovery of several novel proteins and small molecules. For example, Banik et al. screened for clones containing genes associated with the synthesis of teicoplanin and vancomycin-like glycopeptide antibiotics and found two new biosynthetic gene clusters. In addition, an in silico examination from the Global Ocean Metagenomic Survey found 20 new lantibiotic cyclases.

There are challenges to metagenomic approaches to discover new biologically active molecules. Only 40% of enzymatic activities present in a sample can be expressed in E. coli.. In addition, the purification and isolation of eDNA is essential but difficult when the sources of obtained samples are poorly understood. However, collaborative efforts from individuals from diverse fields including bacterial genetics, molecular biology, genomics, bioinformatics, robots, synthetic biology, and chemistry can solve this problem together and potentially lead to the discovery of many important biologically active molecules.

Protein phosphorylation

Posttranslational modification of proteins with phosphate groups has proven to be a key regulatory step throughout all biological systems. Phosphorylation events, either phosphorylation by protein kinases or dephosphorylation by phosphatases, result in protein activation or deactivation. These events have an immense impact on the regulation of physiological pathways, which makes the ability to dissect and study these pathways integral to understanding the details of cellular processes. There exist a number of challenges—namely the sheer size of the phosphoproteome, the fleeting nature of phosphorylation events and related physical limitations of classical biological and biochemical techniques—that have limited the advancement of knowledge in this area. A recent review provides a detailed examination of the impact of newly developed chemical approaches to dissecting and studying biological systems both in vitro and in vivo.

Through the use of a number of classes of small molecule modulators of protein kinases, chemical biologists have been able to gain a better understanding of the effects of protein phosphorylation. For example, nonselective and selective kinase inhibitors, such as a class of pyridinylimidazole compounds described by Wilson, et al., are potent inhibitors useful in the dissection of MAP kinase signaling pathways. These pyridinylimidazole compounds function by targeting the ATP binding pocket. Although this approach, as well as related approaches, with slight modifications, has proven effective in a number of cases, these compounds lack adequate specificity for more general applications. Another class of compounds, mechanism-based inhibitors, combines detailed knowledge of the chemical mechanism of kinase action with previously utilized inhibition motifs. For example, Parang, et al. describe the development of a "bisubstrate analog" that inhibits kinase action by binding both the conserved ATP binding pocket and a protein/peptide recognition site on the specific kinase. While there is no published in vivo data on compounds of this type, the structural data acquired from in vitro studies have expanded the current understanding of how a number of important kinases recognize target substrates. Interestingly, many research groups utilized ATP analogs as a chemical probe to study kinases and identify their substrates.

The development of novel chemical means of incorporating phosphomimetics into proteins has provided important insight into the effects of phosphorylation events. Historically, phosphorylation events have been studied by mutating an identified phosphorylation site (serine, threonine or tyrosine) to an amino acid, such as alanine, that cannot be phosphorylated. While this approach has been successful in some cases, mutations are permanent in vivo and can have potentially detrimental effects on protein folding and stability. Thus, chemical biologists have developed new ways of investigating protein phosphorylation. By installing phospho-serine, phospho-threonine or analogous phosphonate mimics into native proteins, researchers are able to perform in vivo studies to investigate the effects of phosphorylation by extending the amount of time a phosphorylation event occurs while minimizing the often-unfavorable effects of mutations. Protein semisynthesis, or more specifically expressed protein ligation (EPL), has proven to be successful techniques for synthetically producing proteins that contain phosphomimetic molecules at either the C- or the N-terminus. In addition, researchers have built upon an established technique in which one can insert an unnatural amino acid into a peptide sequence by charging synthetic tRNA that recognizes a nonsense codon with an unnatural amino acid. Recent developments indicate that this technique can also be employed in vivo, although, due to permeability issues, these in vivo experiments using phosphomimetic molecules have not yet been possible.

Advances in chemical biology have also improved upon classical techniques of imaging kinase action. For example, the development of peptide biosensors—peptides containing incorporated fluorophore molecules—allowed for improved temporal resolution in in vitro binding assays. Experimental limitations, however, prevent this technique from being effectively used in vivo. One of the most useful techniques to study kinase action is Fluorescence Resonance Energy Transfer (FRET). To utilize FRET for phosphorylation studies, fluorescent proteins are coupled to both a phosphoamino acid binding domain and a peptide that can by phosphorylated. Upon phosphorylation or dephosphorylation of a substrate peptide, a conformational change occurs that results in a change in fluorescence. FRET has also been used in tandem with Fluorescence Lifetime Imaging Microscopy (FLIM) or fluorescently conjugated antibodies and flow cytometry to provide a detailed, specific, quantitative results with excellent temporal and spatial resolution.

Through the augmentation of classical biochemical methods as well as the development of new tools and techniques, chemical biologists have improved accuracy and precision in the study of protein phosphorylation.

Metal complexes in medicine

Metal complexes have many characteristics that can be advantageous in drug design. In comparison to organic-based medicines, metal complexes have many more Coordination Numbers, geometries, and oxidation/reduction states that can be used to make structures that interact with targets in unique ways unavailable to most organic molecules. In addition, the cationic metal is advantageous in complexing with charged targets within biological systems like the phosphate backbone of DNA. Targets of metal-based medicines include DNA, proteins, and enzymes. Each target tupe is described in turn below.

Metal complexes targeting DNA

DNA has been the primary target of metal complexes due to the ability of cationic metal interacting with the anionic backbone of DNA. The anticancer chemotherapy drug cisplatin covalently binds to DNA, which disrupts transcription and leads to programmed cell death. Assuming early detection, cisplatin cures almost all cases of testicular cancer. This drug, however, has severe side effects and great effort is being made to improve drug delivery including attachment to single-walled carbon nanotubes, encapsulation in proteins cages, among other clever strategies.

Another major effort for anticancer metal-based drugs centers around stabilization of the G-quadruplex of DNA. In general, these drugs have a non-covalent interaction with the G-quadruplex as well as a planar positively charged structure.

Metal complexes targeting enzymes and proteins

Though DNA has been a primary target for inorganic medicines, enzymes, and proteins also can be modulated through interactions with these compounds. Metal complexes can interact with the amino acids with the highest reduction potential (histidine, cysteine, and selenocysteine). Metals used in such complexes include gold, platinum, ruthenium, vanadium, cobalt and others. Several new potential therapeutic complexes are currently in the process of discovery and investigation.

Gold

Some gold complexes are showing potential as medicines. A rheumatoid arthritis drug (auranofin, a gold(I) phosphine complex) has shown value in treating parasitic disease through inhibiting thioredoxin glutathione reductase.

Platinum

Along with cisplatin, many other platinum complexes are potential therapeutics. Like auranofin, terpyridine platinum inhibits thioredoxin reductase with nanomolar IC50. This complex also is an inhibitor of the common target enzyme topoisomerase I. Yet another family of complexes with potential anticancer properties are dichloro(SMP)-platinum(II) complexes. These complexes target the matrix metalloproteinase, where the complex coordinates with amino acids of the enzyme in the coordination sites previously held by chlorides, and through the smp ligand. As seen by these few examples, platinum complexes are a particularly active area of research for metal-based medicines.

Ruthenium

Ruthenium complexes have anticancer activity. A library of glutathione transferase inhibitors were created through a combination of ethacrynic acid (a known inhibitor of the enzyme) and ruthenium complexes.

Vanadium

Vanadium complexes have been used in multiple therapeutic settings. A new area in which vanadium may have a great medicinal impact is through the oxovanadium porphyrin complexes. These complexes have demonstrated HIV-1 reverse transcriptase inhibition in vitro.

Issues and outlook

Though there is currently much excitement in the field of metal-based medicines, many challenges still face researchers. One such challenge is selectivity of complexes in vivo. Many of these complexes can bind to common proteins like serum albumin in addition to other proteins with amino acids that are common in protein–metal complex interactions like histidine, cysteine, and selenocysteine. Along with selectivity issues, much is yet unknown about mechanisms through which metal complexes interact with proteins. How complexing between a given metal complex and target protein or enzyme occurs is often unknown or unclear and requires much more elucidation before truly effective metal complexes can be designed and delivered. Currently, physicians utilize very few metal-based medicines in the clinics. For example, none of the 21 drugs approved by the U.S. Food and Drug Administration (FDA) in 2008 were inorganic. However, with the success of cisplatin in cancer treatment, it is not unreasonable to anticipate more metal complexes will be actively used in the treatment of diseases.

Synthetic biology

Synthetic biology focuses on the manipulation of biological components to form new systems or the generation of living systems with synthetic parts. The canonical idea of synthetic biology is the creation of new life, but recently it has come to include bioengineering in terms of the use of interchangeable components to give novel outputs. In the search for modular parts, it is most facile if the building blocks contribute independently to the function of the whole unit so that the modules can be recombined in predictable ways. It is useful for synthetic biologists to define "life": in this context, to be alive an organism must be capable of Darwinian evolution – genetic mutation, self-replication and inheritance of mutations.

Synthetic cells

J. Craig Venter's group has created the first "synthetic" cell – the first cells to exist with fully synthetic DNA. Venter was able to manipulate the synthetic genome to dictate the proteins expressed in the organism. Note that these were not fully synthetic cells but that the synthetic DNA was able to take over all metabolic processes necessary for cell survival and proliferation.

DNA as interchangeable parts

DNA is composed of repeating modular units consisting of an anion phosphate group that forms the polyanion backbone, and nucleotide base pairs that engage in Watson-Crick base pairing to form the double strand. Because the molecular recognition of DNA is based mostly on the polyanion backbone, the nucleotides can be modified without altering the structural integrity of the DNA. Steven Benner's group has generated an artificial genetic alphabet of eight new base pairs that can be amplified by polymerase chain reaction; this indicates that these base pairs can be used in systems that undergo Darwinian evolution.

Proteins as interchangeable parts

Amino acids

Amino acids are poor modular building-blocks because they do not act independently and there is a fundamental lack of understanding about the relationship between linear amino acid sequences and the folding and functionality of proteins. Chemical biologists have been able to model, design, and synthesize peptides and evaluate their function.

Protein secondary structure

Modules consisting of protein secondary structure can be designed to perform specific functions; for example, it has been demonstrated that alpha helices can be used as functional peptide catalysts. The Ghadiri group has created a template peptide that promotes the ligation of two modified helices by bringing the helices into close proximity by specifically designed hydrophobic interactions of the helices with the template.

Folded proteins

Fully folded proteins can be combined in novel ways to generate specific non-natural outcomes. This is highly useful commercially from drug development to the production of polymers – one can imagine the economic benefits if scientists can design systems in which proteins catalyze reactions without the necessity of excessive human intervention to produce commercially relevant materials. For example, the Keasling group has developed a series of proteins that catalyze conversion of acetyl CoA, a common cellular metabolite, into a precursor for the potent antimalarial drug artemisinin.

Modifying molecular switches

Signaling pathways can be modified to be turned on or off by non-natural ligands or inputs to the system. For instance, systems can be modified so that they are autoinhibited by non-natural proteins that release their inhibition upon binding with a specific molecule that is different from the natural signaling molecule of the path. This allows new approaches to studying signal circuits specifically and with user-designed inputs.

Chemical approaches to stem-cell biology

Advances in stem-cell biology have typically been driven by discoveries in molecular biology and genetics. These have included optimization of culture conditions for the maintenance and differentiation of pluripotent and multipotent stem-cells and the deciphering of signaling circuits that control stem-cell fate. However, chemical approaches to stem-cell biology have recently received increased attention due to the identification of several small molecules capable of modulating stem-cell fate in vitro. A small molecule approach offers particular advantages over traditional methods in that it allows a high degree of temporal control, since compounds can be added or removed at will, and tandem inhibition/activation of multiple cellular targets.

Small molecules that modulate stem-cell behavior are commonly identified in high-throughput screens. Libraries of compounds are screened for the induction of a desired phenotypic change in cultured stem-cells. This is usually observed through activation or repression of a fluorescent reporter or by detection of specific cell surface markers by FACS or immunohistochemistry. Hits are then structurally optimized for activity by the synthesis and screening of secondary libraries. The cellular targets of the small molecule can then be identified by affinity chromatography, mass spectrometry, or DNA microarray.

A trademark of pluripotent stem-cells, such as embryonic stem-cells (ESCs), is the ability to self-renew indefinitely. The conventional use of feeder cells and various exogenous growth factors in the culture of ESCs presents a problem in that the resulting highly variable culture conditions make the long-term expansion of un-differentiated ESCs challenging. Ideally, chemically defined culture conditions could be developed to maintain ESCs in a pluripotent state indefinitely. Toward this goal, the Schultz and Ding labs at the Scripps Research Institute identified a small molecule that can preserve the long-term self-renewal of ESCs in the absence of feeder cells and other exogenous growth factors. This novel molecule, called pluripotin, was found to simultaneously inhibit multiple differentiation inducing pathways.

The utility of stem-cells is in their ability to differentiate into all cell types that make up an organism. Differentiation can be achieved in vitro by favoring development toward a particular cell type through the addition of lineage specific growth factors, but this process is typically non-specific and generates low yields of the desired phenotype. Alternatively, inducing differentiation by small molecules is advantageous in that it allows for the development of completely chemically defined conditions for the generation of one specific cell type. A small molecule, neuropathiazol, has been identified which can specifically direct differentiation of multipotent neural stem cells into neurons. Neuropathiazol is so potent that neurons develop even in conditions that normally favor the formation of glial cells, a powerful demonstration of controlling differentiation by chemical means.

Because of the ethical issues surrounding ESC research, the generation of pluripotent cells by reprogramming existing somatic cells into a more "stem-like" state is a promising alternative to the use of standard ESCs. By genetic approaches, this has recently been achieved in the creation of ESCs by somatic cell nuclear transfer and the generation of induced pluripotent stem-cells by viral transduction of specific genes. From a therapeutic perspective, reprogramming by chemical means would be safer than genetic methods because induced stem-cells would be free of potentially dangerous transgenes. Several examples of small molecules that can de-differentiate somatic cells have been identified. In one report, lineage-committed myoblasts were treated with a compound, named reversine, and observed to revert to a more stem-like phenotype. These cells were then shown to be capable of differentiating into osteoblasts and adipocytes under appropriate conditions.

Stem-cell therapies are currently the most promising treatment for many degenerative diseases. Chemical approaches to stem-cell biology support the development of cell-based therapies by enhancing stem-cell growth, maintenance, and differentiation in vitro. Small molecules that have been shown to modulate stem-cell fate are potential therapeutic candidates and provide a natural lean-in to pre-clinical drug development. Small molecule drugs could promote endogenous stem-cells to differentiate, replacing previously damaged tissues and thereby enhancing the body's own regenerative ability. Further investigation of molecules that modulate stem-cell behavior will only unveil new therapeutic targets.

Fluorophores and techniques to tag proteins

Organisms are composed of cells that, in turn, are composed of macromolecules, e.g. proteins, ribosomes, etc. These macromolecules interact with each other, changing their concentration and suffering chemical modifications. The main goal of many biologists is to understand these interactions, using MRI, ESR, electrochemistry, and fluorescence among others. The advantages of fluorescence reside in its high sensitivity, non-invasiveness, safe detection, and ability to modulate the fluorescence signal. Fluorescence was observed mainly from small organic dyes attached to antibodies to the protein of interest. Later, fluorophores could directly recognize organelles, nucleic acids, and important ions in living cells. In the past decade, the discovery of green fluorescent protein (GFP), by Roger Y. Tsien, hybrid system and quantum dots have enable assessing protein location and function more precisely. Three main types of fluorophores are used: small organic dyes, green fluorescent proteins, and quantum dots. Small organic dyes usually are less than 1 kD, and have been modified to increase photostability, enhance brightness, and reduce self-quenching. Quantum dots have very sharp wavelength, high molar absorptivity and quantum yield. Both organic dyes and quantum dyes do not have the ability to recognize the protein of interest without the aid of antibodies, hence they must use immunolabeling. Since the size of the fluorophore-targeting complex typically exceeds 200 kD, it might interfere with multiprotein recognition in protein complexes, and other methods should be use in parallel. An advantage includes diversity of properties and a limitation is the ability of targeting in live cells. Green fluorescent proteins are genetically encoded and can be covalently fused to your protein of interest. A more developed genetic tagging technique is the tetracysteine biarsenical system, which requires modification of the targeted sequence that includes four cysteines, which binds membrane-permeable biarsenical molecules, the green and the red dyes "FlAsH" and "ReAsH", with picomolar affinity. Both fluorescent proteins and biarsenical tetracysteine can be expressed in live cells, but present major limitations in ectopic expression and might cause lose of function. Giepmans shows parallel applications of targeting methods and fluorophores using GFP and tetracysteine with ReAsH for α-tubulin and β-actin, respectively. After fixation, cells were immunolabeled for the Golgi matrix with QD and for the mitochondrial enzyme cytochrome with Cy5.

Protein dynamics

Fluorescent techniques have been used assess a number of protein dynamics including protein tracking, conformational changes, protein–protein interactions, protein synthesis and turnover, and enzyme activity, among others.

Three general approaches for measuring protein net redistribution and diffusion are single-particle tracking, correlation spectroscopy and photomarking methods. In single-particle tracking, the individual molecule must be both bright and sparse enough to be tracked from one video to the other. Correlation spectroscopy analyzes the intensity fluctuations resulting from migration of fluorescent objects into and out of a small volume at the focus of a laser. In photomarking, a fluorescent protein can be dequenched in a subcellular area with the use of intense local illumination and the fate of the marked molecule can be imaged directly. Michalet and coworkers used quantum dots for single-particle tracking using biotin-quantum dots in HeLa cells.

One of the best ways to detect conformational changes in proteins is to sandwich said protein between two fluorophores. FRET will respond to internal conformational changes result from reorientation of the fluorophore with respect to the other. Dumbrepatil sandwiched an estrogen receptor between a CFP (cyan fluorescent protein) and a YFP (yellow fluorescent protein) to study conformational changes of the receptor upon binding of a ligand.

Fluorophores of different colors can be applied to detect their respective antigens within the cell. If antigens are located close enough to each other, they will appear colocalized and this phenomenon is known as colocalization. Specialized computer software, such as CoLocalizer Pro, can be used to confirm and characterize the degree of colocalization.

FRET can detect dynamic protein–protein interaction in live cells providing the fluorophores get close enough. Galperin et al. used three fluorescent proteins to study multiprotein interactions in live cells.

Tetracysteine biarsenical systems can be used to study protein synthesis and turnover, which requires discrimination of old copies from new copies. In principle, a tetracysteine-tagged protein is labeled with FlAsH for a short time, leaving green labeled proteins. The protein synthesis is then carried out in the presence of ReAsH, labeling the new proteins as red.

One can also use fluorescence to see endogenous enzyme activity, typically by using a quenched activity based proteomics (qABP). Covalent binding of a qABP to the active site of the targeted enzyme will provide direct evidence concerning if the enzyme is responsible for the signal upon release of the quencher and regain of fluorescence.

The unique combination of high spatial and temporal resolution, nondestructive compatibility with living cells and organisms, and molecular specificity insure that fluorescence techniques will remain central in the analysis of protein networks and systems biology.

Applications of DNA microarrays in chemical biology

Planar surfaces functionalized with single- or double-strand nucleic acids have enabled researchers to address a variety of salient biological and biochemical questions in recent years. The general architecture of modern DNA microarrays reflects the historical progression from the sequence-specific probing of whole chromosomes immobilized on glass slides (as early as 1961 with fluorescent in situ hybridization) and the low-density porous membrane arrays available since the early 1990s, to the high-density (10²-10⁴ features/mm²) solid support platforms that exist today. The massively parallel processing capabilities of these picomolar-range contemporary arrays provide for the generation of large data sets and multiplexed analysis. Furthermore, several top-down and bottom-up assembly methodologies provide researchers with the option for "in-house" production of arrays from custom oligonucleotide libraries or the use of commercial genome chips, notably those developed by Affymetrix and Agilent Technologies.

DNA microarrays can be used to conduct several general types of experiments, most of which relying on the hybridization of fluorescently labeled single-strand DNA molecules isolated from a biological sample to their single-strand complement probes presented on an array. One of the earliest conceived applications for DNA microarrays was for single-nucleotide polymorphism (SNP) genotyping. Since SNPs are a "quick and dirty" approach to detect genetic indicators of pathologies and lineages, arrays in theory provide a facile method for diagnosis; this was confirmed experimentally in the late 1990s in the successful SNP analysis of human tumors. Although there are currently commercially available arrays (e.g. bovine mapping chips) to characterize SNPs, it seems likely that the nascent availability of high-throughput and low-cost pyrosequencing will become the preferred method of recognition, or replace the need for SNP detection altogether with rapid whole-genome sequencing.

A different application of microarray technology that has become the gold standard for RNA analysis in recent years is the widespread utilization of expression microarrays, or "gene chips". Gene chip preparation calls for the quantitative reverse transcription of the total cellular RNA pool into labeled and fragmented single-strand DNA prior to hybridization-based capture. Up- and down-regulation of genes in response to stressors or disease states are quantitatively compared in cell lines and organisms. Coupled expression microarray and quantitative proteomics experiments have allowed for the in-depth exploration of the oftentimes non-linear relationship between the abundance of a particular transcribed message and that of its corresponding translated protein. These integrative studies, partially enabled by quantitative DNA microarray technology, have been successfully applied to a variety of biological systems, including yeast, bovine, mouse, bacterial, and human. The expression analysis community has amassed such a significant amount of expression microarray data that they are freely available in public databases.

These types of surfaces can also be used to analyze DNA-protein interactions on a genome-wide scale via chromatin immunoprecipitation, followed by an array-based analysis of the DNA (ChIP-chip). ChIP-chip experiments are enabled by the co-purification of a DNA-binding protein of interest with its corresponding genomic loci when a cross-linked chromatin extract is probed with an antibody to said protein. After purification, amplification and labeling, the DNA is applied to a microarray representing the entire genome; the data are plotted as a histogram that resolves the specific genomic regions associated with that protein. ChIP-chip experiments have provided the scientific community with a wealth of information about the steady-state genomic locations of DNA-binding proteins, such as histones, transcription factors, and polymerase machinery, and have also been successfully applied to studies on the dynamics of transcription factor binding. The data from these experiments may be further manipulated to computationally derive consensus binding sequences for some transcription factors, giving the opportunity for insight into the in vivo behavior of the factor, deeper than simple information about localization.

DNA microarrays are also amenable to the direct analysis of protein–DNA interactions in kinetic binding assays as analyzed by surface plasmon resonance (SPR). This experimental approach also relies on single-strand DNA immobilized on a high-density array; however, the quantitative readout is based on a change in the optical properties of the DNA-functionalized surface when a protein flowed over the surface binds to the sequence in a particular surface feature. DNA-functionalized arrays analyzed with SPR in this way have yielded kinetic data regarding fundamental molecular biological processes. Recently, SPR analysis of a DNA microarray and components of the DNA replication machinery helped to elucidate the biochemical nuances of the replication fork.

High-density DNA microarrays have emerged as an important component of the chemical biology toolkit. The existing technology allows for the construction of customizable, as well as general, arrays and provides researchers with the opportunity to generate robust data from many different types of biological inputs. Considering the relatively recent shift in the scientific community away from binary perturbation/readout studies and toward "big science" and large data sets, it seems likely that DNA microarrays will continue to enable pertinent biological research for many years to come.

Applications of Chemical Biology in Drug Discovery

Chemical biology approaches can help answer important questions of relevance to small molecule drug discovery projects. This includes questions related to the characterization of protein targets and the molecular pharmacology of small molecule drugs that modulate target function.

Chemical Biology Approaches to Characterize Protein Targets

Protein Target Characterization

What is my target?

- Affinity chemoproteomics can be used to identify the targets of compounds that have shown activity in a phenotypic or pathway screen. For example, this approach was used to identify of MTH1 as a potential anticancer target of the (S) form of Crizotinib. Affinity chemoproteomics technology was also used to identify the BET bromodomains as the molecular targets of a series of small molecule modulators of Apolipoprotein A1 (ApoA1).- Mutagensis is important target validation approach is to generate a catalytically-dead target protein to tease apart the role of the enzymatic and scaffolding functions. This method is often used in kinase validation experiments where the pharmacological performance of a putative inhibitor is tested in cells with the kinase knocked out and replaced through transfection of the recombinant wild-type protein or its kinase dead mutant. This approach was used recently to confirm DCLK1 as the functional anti-proliferative target of a previously developed LRRK2 inhibitor for Parkinson's disease

Does my target exist in multiple related forms and does this vary across different tissues and species?

- Genotype-tissue expression project (GTEx) has generated a database of mRNA expression levels across multiple tissues. An illustration of the use of this resource shows that PDE4B is expressed in multiple forms that vary across human tissues. Being aware of these different forms has enabled development of a selective inhibitor targeting PDE4B in the brain through taking advantage of differences in sequences in the long and short forms outside the active site.

What is the endogenous ligand for my target and its concentration in the diseased state?

- Metabolomics in combination with clickable chemical reporters can enable chemoproteomic methods to be applied to substrate identification, as shown recently for protein lipidation.

Is my target post-translationally modified?

- Dimedone chemical probes that chemoselectively react with sulfenic acid residues in EGFR have shown how cysteine oxidation can be use for enzymatic regulation.

What is the turnover of my target and is this affected by my compound?

- Chemical probes of BTK were used to confirm its slow turnover rate.- Immunoprecipitation of EGFR has been used to confirm increased protein half-livey, likely due to a concomitant decrease in binding to the Cbl ubiquitin ligase responsible for regulating the degradation of EGFR. This increased half-life could result in a need to change the projected dose or frequency. In addition to mutations altering the turnover rate of a protein, small molecule drugs can also change the turnover and thus the amount of protein in the cell. For instance, many kinases are known to be clients of the Hsp90-Cdc37 chaperone system and recent studies have found that some ATP-competitive inhibitors disrupt the ability of Cdc37 to bind to the target kinase and recruit it to Hsp9015.

What potential off- targets are most closely related to my target sequence and function?

- Computational biology and structural bioinformatic analyses of protein sequences and binding pocket topology can be used to delineate which proteins are similar to a desired target's biological sequence as well as its binding site shape. Additionally, chemoinformatic approaches using in silico tools to predict potential off-targets based on the structural similarity between a novel small molecule and previously known compounds with different pharmacology are also useful.

Chemical Biology Approaches to Characterize Molecular Pharmacology

Molecular Pharmacology Characterization

What target(s) and off-targets does my molecule bind?

- Activity probes such as nucleotide acyl phosphates (KiNativ™) probes that react chemoselectively with the catalytic lysine of >80% of all known kinases can be used to determine many of the kinase targets of a molecule.- ABPP using probes derived from covalent kinase inhibitors, have been used to determine the cellular targets of these covalent inhibitors in living cells.- Chemoproteomics was used to assess the multiple binding partners of HDAC inhibitors in protein complexes scaffolded by ELM-SANT domain subunits. Thermal proteomics, where one looks for proteins that show increased thermal stability at elevated temperatures due to compound binding, has been used to identify off-targets of several kinase inhibitors including the BRAF inhibitor Vemurafenib. Affinity capture where a chemical probe is tethered to a solid support, was used to identify cereblon (CRBN) as a thalidomide binding protein.- DARTS (Drug Affinity Responsive Target Stability); takes advantage of a reduction in the protease susceptibility of the target protein upon drug binding. This technique was used to confirm binding biological binding partners for a number of molecules: rapamycin and FKBP with FKBP12 and resveratrol with eIF4A.- SPROX (Stability of Proteins from Rates of Oxidation); utilizes the chemical denaturant-dependent oxidation rates of methionine residues in a protein to report on the thermodynamic properties of its global and/or subglobal unfolding/refolding reactions.

What is the tissue distribution of my molecule?

- Quantitative mass spectrometry (MS) was recently used to determine the concentrations of the chemotherapeutic agent YM155 in various cancer cells.

What are the functional consequences to my target when my molecule binds?

- The use of shRNA gene silencing combined with immunoblotting showed that BRAF inhibitors GDC-0879 and PLX4720 block MAPK signaling in BRAF (V600E) tumors while these same compounds activate the RAF-MEK-ERK pathway in KRAS mutant and RAS/RAF wild-type tumors.

How much target occupancy do I need to drive my relevant biological phenotype?

- A clickable covalent occupancy probe of fatty acid amide hydrolase (FAAH) has been used to relate target engagement to anandamide elevation and efficacy in models of rodent inflammatory and neuropathic pain. Target occupancy quantification can also be used to validate the relevance of a drug target identified from phenotypic screening. A clickable covalent probe of the mRNA decapping enzyme DcpS, which used a sulfonyl fluoride warhead to target a reactive tyrosine in the binding site of the enzyme, enabled the assessment of DcpS target engagement of a diaminoquinazoline inhibitor (developed from a phenotypic screen for the treatment of spinal muscular atrophy).

References

Chemical biology Wikipedia

(Text) CC BY-SA

Contents