![]() | ||
An intrinsically disordered protein (IDP) is a protein that lacks a fixed or ordered three-dimensional structure. IDPs cover a spectrum of states from fully unstructured to partially structured and include random coils, (pre-)molten globules, and large multi-domain proteins connected by flexible linkers. They constitute one of the main types of protein (alongside globular, fibrous and membrane proteins).
Contents
- History
- Biological roles
- Flexible linkers
- Linear motifs
- Coupled folding and binding
- Disorder in the bound state fuzzy complexes
- Structural Aspects
- Experimental validation
- Disorder prediction
- Distinguishing IDPs from well structured proteins
- Prediction methods
- Disorder and disease
- Computer simulations
- Pioneering IDP research labs
- References
The discovery of IDPs has challenged the traditional protein structure paradigm, that protein function depends on a fixed three-dimensional structure. This dogma has been challenged over the last decades by increasing evidence from various branches of structural biology, suggesting that protein dynamics may be highly relevant for such systems. Despite their lack of stable structure, IDPs are a very large and functionally important class of proteins. In some cases, IDPs can adopt a fixed three-dimensional structure after binding to other macromolecules. Overall, IDPs are different from structured proteins in many ways and tend to have distinct properties in terms of function, structure, sequence, interactions, evolution and regulation.
History
In the 1930s -1950s, the first protein structures were solved by protein crystallography. These early structures suggested that a fixed three-dimensional structure might be generally required to mediate biological functions of proteins. When stating that proteins have just one uniquely defined configuration, Mirsky and Pauling did not recognize that Fisher's work would have supported their thesis with his 'Lock and Key' model (1894). These publications solidified the central dogma of molecular biology in that the sequence determines the structure which, in turn, determines the function of proteins. In 1950, Karush wrote about 'Configurational Adaptability' contradicting all the assumptions and research in the 19th century. He was convinced that proteins have more than one configuration at the same energy level and can choose one when binding to other substrates. In the 1960s, Levinthal's paradox suggested that the systematic conformational search of a long polypeptide is unlikely to yield a single folded protein structure on biologically relevant timescales (i.e. seconds to minutes). Curiously, for many (small) proteins or protein domains, relatively rapid and efficient refolding can be observed in vitro. As stated in Anfinsen's Dogma from 1973, the fixed 3D structure of these proteins is uniquely encoded in its primary structure (the amino acid sequence), is kinetically accessible and stable under a range of (near) physiological conditions, and can therefore be considered as the native state of such "ordered" proteins.
During the subsequent decades, however, many large protein regions could not be assigned in x-ray datasets, indicating that they occupy multiple positions, which average out in electron density maps. The lack of fixed, unique positions relative to the crystal lattice suggested that these regions were "disordered". Nuclear magnetic resonance spectroscopy of proteins also demonstrated the presence of large flexible linkers and termini in many solved structural ensembles. It is now generally accepted that proteins exist as an ensemble of similar structures with some regions more constrained than others. Intrinsically Unstructured Proteins (IUPs) occupy the extreme end of this spectrum of flexibility, whereas IDPs also include proteins of considerable local structure tendency or flexible multidomain assemblies.These highly dynamic disordered regions of proteins have subsequently been linked to functionally important phenomena such as allosteric regulation and enzyme catalysis.
In the 2000s, bioinformatic predictions of intrinsic disorder in proteins indicated that intrinsic disorder is more common in sequenced/predicted proteomes than in known structures in the protein database. Based on DISOPRED2 prediction, long (>30 residue) disordered segments occur in 2.0% of archaean, 4.2% of eubacterial and 33.0% of eukaryotic proteins. In 2001, Dunker published his paper 'Intrinsically Disordered Proteins' questioning whether the newly found information was ignored for 50 years.
In the 2010s it became clear that IDPs are highly abundant among disease-related proteins.
Biological roles
Many disordered proteins have the binding affinity with their receptors regulated by post-translational modification, thus it has been proposed that the flexibility of disordered proteins facilitates the different conformational requirements for binding the modifying enzymes as well as their receptors. Intrinsic disorder is particularly enriched in proteins implicated in cell signaling, transcription and chromatin remodeling functions.
Flexible linkers
Disordered regions are often found as flexible linkers or loops connecting domains. Linker sequences vary greatly in length but are typically rich in polar uncharged amino acids. Flexible linkers allow the connecting domains to freely twist and rotate to recruit their binding partners via protein domain dynamics. They also allow their binding partners to induce larger scale conformational changes by long-range allostery
Linear motifs
Linear motifs are short disordered segments of proteins that mediate functional interactions with other proteins or other biomolecules (RNA, DNA, sugars etc.). Many roles of linear motifs are associated with cell regulation, for instance in control of cell shape, subcellular localisation of individual proteins and regulated protein turnover. Often, post-translational modifications such as phosphorylation tune the affinity (not rarely by several orders of magnitude) of individual linear motifs for specific interactions. Relatively rapid evolution and a relatively small number of structural restraints for establishing novel (low-affinity) interfaces make it particularly challenging to detect linear motifs but their widespread biological roles and the fact that many viruses mimick/hijack linear motifs to efficiently recode infected cells underlines the timely urgency of research on this very challenging and exciting topic. Unlike globular proteins IDPs do not have spatially-disposed active pockets. Nevertheless, in 80% of IDPs (~3 dozens) subjected to detailed structural characterization by NMR there are linear motifs termed PreSMos (pre-structured motifs) that are transient secondary structural elements primed for target recognition. In several cases it has been demonstrated that these transient structures become full and stable secondary structures, e.g., helices, upon target binding. Hence, PreSMos are the putative active sites in IDPs.
Coupled folding and binding
Many unstructured proteins undergo transitions to more ordered states upon binding to their targets (e.g. Molecular Recognition Features (MoRFs)). The coupled folding and binding may be local, involving only a few interacting residues, or it might involve an entire protein domain. It was recently shown that the coupled folding and binding allows the burial of a large surface area that would be possible only for fully structured proteins if they were much larger. Moreover, certain disordered regions might serve as "molecular switches" in regulating certain biological function by switching to ordered conformation upon molecular recognition like small molecule-binding, DNA/RNA binding, ion interactions etc.
The ability of disordered proteins to bind, and thus to exert a function, shows that stability is not a required condition. Many short functional sites, for example Short Linear Motifs are over-represented in disordered proteins.
Disorder in the bound state (fuzzy complexes)
Intrinsically disordered proteins can retain their conformational freedom even when they bind specifically to other proteins. The structural disorder in bound state can be static or dynamic. In fuzzy complexes structural multiplicity is required for function and the manipulation of the bound disordered region changes activity. The conformational ensemble of the complex is modulated via post-translational modifications or protein interactions. Specificity of DNA binding proteins often depends on the length of fuzzy regions, which is varied by alternative splicing.
Structural Aspects
Intrinsically disordered proteins adapt many different structures in vivo according to the cell's conditions, creating a structural or conformational ensemble.
Therefore, their structures are strongly function-related. However, only few proteins are fully disordered in their native state. Disorder is mostly found in intrinsically disordered regions (IDRs) within an otherwise well-structured protein. The term intrinsically disordered protein (IDP) therefore includes proteins that contain IDRs as well as fully disordered proteins.
The existence and kind of protein disorder is encoded in its amino acid sequence. In general, IDPs are characterized by a low content of bulky hydrophobic amino acids and a high proportion of polar and charged amino acids, usually referred to as low hydrophobicity. This property leads to good interactions with water. Furthermore, high net charges promote disorder because of electrostatic repulsion resulting from equally charged residues. Thus disordered sequences cannot sufficiently bury a hydrophobic core to fold into stable globular proteins. In some cases, hydrophobic clusters in disordered sequences provide the clues for identifying the regions that undergo coupled folding and binding (refer to biological roles). Many disordered proteins reveal regions without any regular secondary structure These regions can be termed as flexible, compared to structured loops. While the latter are rigid and contain only one set of Ramachandran angles, IDPs involve multiple sets of angles. The term flexibility is also used for well-structured proteins, but describes a different phenomenon in the context of disordered proteins. Flexibility in structured proteins is bound to an equilibrium state, while it is not so in IDPs. Many disordered proteins also reveal low complexity sequences, i.e. sequences with over-representation of a few residues. While low complexity sequences are a strong indication of disorder, the reverse is not necessarily true, that is, not all disordered proteins have low complexity sequences. Disordered proteins have a low content of predicted secondary structure.
Experimental validation
Intrinsically unfolded proteins, once purified, can be identified by various experimental methods. The primary method to obtain information on disordered regions of a protein is NMR spectroscopy. The lack of electron density in X-ray crystallographic studies may also be a sign of disorder.
Folded proteins have a high density (partial specific volume of 0.72-0.74 mL/g) and commensurately small radius of gyration. Hence, unfolded proteins can be detected by methods that are sensitive to molecular size, density or hydrodynamic drag, such as size exclusion chromatography, analytical ultracentrifugation, small angle X-ray scattering (SAXS), and measurements of the diffusion constant. Unfolded proteins are also characterized by their lack of secondary structure, as assessed by far-UV (170-250 nm) circular dichroism (esp. a pronounced minimum at ~200 nm) or infrared spectroscopy. Unfolded proteins also have exposed backbone peptide groups exposed to solvent, so that they are readily cleaved by proteases, undergo rapid hydrogen-deuterium exchange and exhibit a small dispersion (<1 ppm) in their 1H amide chemical shifts as measured by NMR. (Folded proteins typically show dispersions as large as 5 ppm for the amide protons.) Recently, new methods including Fast parallel proteolysis (FASTpp) have been introduced, which allow to determine the fraction folded/disordered without the need for purification. Even subtle differences in the stability of missense mutations, protein partner binding and (self)polymerisation-induced folding of (e.g.) coiled-coils can be detected using FASTpp as recently demonstrated using the tropomyosin-troponin protein interaction. Fully unstructured protein regions can be experimentally validated by their hypersusceptibility to proteolysis using short digestion times and low protease concentrations.
Bulk methods to study IDP structure and dynamics include SAXS for ensemble shape information, NMR for atomistic ensemble refinement, Fluorescence for visualising molecular interactions and conformational transitions, x-ray crystallography to highlight more mobile regions in otherwise rigid protein crystals, cryo-EM to reveal less fixed parts of proteins, light scattering to monitor size distributions of IDPs or their aggregation kinetics, Circular Dichroism to monitor secondary structure of IDPs.
Single-molecule methods to study IDPs include spFRET to study conformational flexibility of IDPs and the kinetics of structural transitions, optical tweezers for high-resolution insights into the ensembles of IDPs and their oligomers or aggregates, nanopores to reveal global shape distributions of IDPs, magnetic tweezers to study structural transitions for long times at low forces, high-speed AFM to visualise the spatio-temporal flexibility of IDPs directly.
Disorder prediction
Disorder prediction algorithms can predict Intrinsic Disorder (ID) propensity with high accuracy (approaching around 80%) based on primary sequence composition, similarity to unassigned segments in protein x-ray datasets, flexible regions in NMR studies and physico-chemical properties of amino acids.
Distinguishing IDPs from well-structured proteins
Separating disordered from ordered proteins is essential for disorder prediction. One of the first steps to find a factor that distinguishes IDPs from non-IDPs is to specify biases within the amino acid composition. The following hydrophilic, charged amino acids A, R, G, Q, S, P, E and K have been characterized as disorder-promoting amino acids, while order-promoting amino acids W, C, F, I, Y, V, L, and N are hydrophobic and uncharged. The remaining amino acids H, M, T and D are ambiguous, found in both ordered and unstructured regions. This information is the basis of most sequence-based predictors. Regions with little to no secondary structure, also known as NORS (NO Regular Secondary structure) regions, and low-complexity regions can easily be detected. However, not all disordered proteins contain such low complexity sequences.
Prediction methods
Determining disordered regions from biochemical methods is very costly and time-consuming. Due to the variable nature of IDPs, only certain aspects of their structure can be detected, so that a full characterization requires a large number of different methods and experiments. This further increases the expense of IDP determination. In order to overcome this obstacle, computer-based methods are created for predicting protein structure and function. It is one of the main goals of bioinformatics to derive knowledge by prediction. Predictors for IDP function are also being developed, but mainly use structural information such as linear motif sites. There are different approaches for predicting IDP structure, such as neural networks or matrix calculations, based on different structural and/or biophysical properties.
Many computational methods exploit sequence information to predict whether a protein is disordered. Notable examples of such software include IUPRED and Disopred. Different methods may use different definitions of disorder. Meta-predictors show a new concept, combining different primary predictors to create a more competent and exact predictor.
Due to the different approaches of predicting disordered proteins, estimating their relative accuracy is fairly difficult. For example, neural networks are often trained on different datasets. The disorder prediction category is a part of biannual CASP experiment that is designed to test methods according accuracy in finding regions with missing 3D structure (marked in PDB files as REMARK465, missing electron densities in X-ray structures).
Disorder and disease
Intrinsically unstructured proteins have been implicated in a number of diseases. Aggregation of misfolded proteins is the cause of many synucleinopathies and toxicity as those proteins start binding to each other randomly and can lead to cancer or cardiovascular diseases. Thereby, misfolding can happen spontaneously because millions of copies of proteins are made during the lifetime of an organism. The aggregation of the intrinsically unstructured protein α-Synuclein is thought to be responsible. The structural flexibility of this protein together with its susceptibility to modification in the cell leads to misfolding and aggregation. Genetics, oxidative and nitrative stress as well as mitochondrial impairment impact the structural flexibility of the unstructured α-Synuclein protein and associated disease mechanisms. Many key oncogenes have large intrinsically unstructured regions, for example p53 and BRCA1. These regions of the proteins are responsible for mediating many of their interactions. Taking the cell's native defense mechanisms as a model drugs can be developed, trying to block the place of noxious substrates and inhibiting them, and thus counteracting the disease.
Computer simulations
Structural and dynamical properties of intrinsically unstructured proteins are being studied by molecular dynamics simulations. Findings from these simulations suggest a highly flexible conformational ensemble of intrinsically disordered proteins at different temperatures which is related to the presence of low free energy barriers.
Effects of confinement have also recently been addressed. For example, these studies suggest that confinement tends to increase the population of turn structures with respect to the population of coils and β-hairpins.
Moreover, various protocols and methods of analyzing IDPs, such as studies based on quantitative analysis of GC content in genes and their respective chromosomal bands, have been used to understand functional IDP segments.
Pioneering IDP research labs
In the last ten years, a great number of laboratories have investigated protein disorders using both experimental (e.g. SAXS-NMR, single-molecule fluorescence) and computational (analysis of protein structure) techniques.