Trisha Shetty (Editor)

Structure validation

Updated on
Edit
Like
Comment
Share on FacebookTweet on TwitterShare on LinkedInShare on Reddit
Structure validation

Macromolecular structure validation is the process of evaluating reliability for 3-dimensional atomic models of large biological molecules such as proteins and nucleic acids. These models, which provide 3D coordinates for each atom in the molecule (see example in the image), come from structural biology experiments such as x-ray crystallography or nuclear magnetic resonance (NMR). The validation has three aspects: 1) checking on the validity of the thousands to millions of measurements in the experiment; 2) checking how consistent the atomic model is with those experimental data; and 3) checking consistency of the model with known physical and chemical properties.

Contents

Proteins and nucleic acids are the workhorses of biology, providing the necessary chemical reactions, structural organization, growth, mobility, reproduction, and environmental sensitivity. Essential to their biological functions are the detailed 3D structures of the molecules and the changes in those structures. To understand and control those functions, we need accurate knowledge about the models that represent those structures, including their many strong points and their occasional weaknesses.

End-users of macromolecular models include clinicians, teachers and students, as well as the structural biologists themselves, journal editors and referees, experimentalists studying the macromolecules by other techniques, and theoreticians and bioinformaticians studying more general properties of biological molecules. Their interests and requirements vary, but all benefit greatly from a global and local understanding of the reliability of the models.

Historical summary

Macromolecular crystallography was preceded by the older field of small-molecule x-ray crystallography (for structures with less than a few hundred atoms). Small-molecule diffraction data extends to much higher resolution than feasible for macromolecules, and has a very clean mathematical relationship between the data and the atomic model. The residual, or R-factor, measures the agreement between the experimental data and the values back-calculated from the atomic model. For a well-determined small-molecule structure the R-factor is nearly as small as the uncertainty in the experimental data (well under 5%). Therefore, that one test by itself provides most of the validation needed, but a number of additional consistency and methodology checks are done by automated software as a requirement for small-molecule crystal structure papers submitted to the International Union of Crystallography (IUCr) journals such as Acta Crystallographica section B or C. Atomic coordinates of these small-molecule structures are archived and accessed through the Cambridge Structural Database (CSD) or the Crystallography Open Database (COD).

The first macromolecular validation software was developed around 1990, for proteins. It included Rfree cross-validation for model-to-data match, bond length and angle parameters for covalent geometry, and sidechain and backbone conformational criteria. For macromolecular structures, the atomic models are deposited in the Protein Data Bank (PDB), still the single archive of this data. The PDB was established in the 1970s at Brookhaven National Laboratory, moved in 2000 to the [1] (Research Collaboration for Structural Biology) centered at Rutgers, and expanded in 2003 to become the wwPDB (worldwide Protein Data Bank), with access sites added in Europe ([2]) and Asia ([3]), and with NMR data handled at the BioMagResBank (BMRB) in Wisconsin.

Validation rapidly became standard in the field, with further developments described below. *Obviously needs expansion*

A large boost was given to the applicability of comprehensive validation for both x-ray and NMR as of February 1, 2008, when the worldwide Protein Data Bank (wwPDB) made mandatory the deposition of experimental data along with atomic coordinates. Since 2012 strong forms of validation have been in the process of being adopted for wwPDB deposition from recommendations of the wwPDB Validation Task Force committees for x-ray crystallography, for NMR, for SAXS (small-angle x-ray scattering), and for cryoEM (cryo-Electron Microscopy).

Global vs local criteria

Many evaluation criteria apply globally to an entire experimental structure, most notably the resolution, the anisotropy or incompleteness of the data, and the residual or R-factor that measures overall model-to-data match (see below). Those help a user choose the most accurate among related Protein Data Bank entries to answer their questions. Other criteria apply to individual residues or local regions in the 3D structure, such as fit to the local electron density map or steric clashes between atoms. Those are especially valuable to the structural biologist for making improvements to the model, and to the user for evaluating the reliability of that model right around the place they care about - such as a site of enzyme activity or drug binding. Both types of measures are very useful, but although global criteria are easier to state or publish, local criteria make the greatest contribution to scientific accuracy and biological relevance. As expressed in the Rupp textbook, "Only local validation, including assessment of both geometry and electron density, can give an accurate picture of the reliability of the structure model or any hypothesis based on local features of the model."

Conformation (dihedrals): protein & RNA

The backbone and sidechain dihedral angles of protein and RNA have been shown to have specific combinations of angles which are allowed.

Carbohydrates

The branched and cyclic nature of carbohydrates poses particular problems to structure validation tools. At higher resolutions, it is possible to determine the sequence/structure of oligo- and poly-saccharides, both as covalent modifications and as ligands. However, at lower resolutions (typically lower than 2.0Å), sequences/structures should either match known structures, or be supported by complementary techniques such as Mass Spectrometry. Also, monosaccharides have clear conformational preferences (saturated rings are typically found in chair conformations), but errors introduced during model building and/or refinement (wrong linkage chirality or distance, or wrong choice of model - see for recommendations on carbohydrate model building and refinement and for reviews on general errors in carbohydrate structures) can bring their atomic models out of their energy minima. Around 20% of the deposited carbohydrate structures are in unjustified energy minima.

A number of carbohydrate validation web services are available at glycosciences.de (including nomenclature checks, linkage checks and cross-validation with Mass Spectrometry data through the use of GlycanBuilder), whereas the CCP4 suite currently distributes Privateer, which is a tool that is integrated into the model building and refinement process itself. Privateer is able to check stereo- and regio-chemistry, ring conformation and puckering, linkage torsions, and real-space correlation against positive omit density, generating aperiodic torsion restraints on ring bonds, which can be used by any refinement software in order to maintain the monosaccharide's minimal energy conformation.

Privateer also generates scalable two-dimensional SVG diagrams according to the Essentials of Glycobiology standard symbol nomenclature containing all the validation information as tooltip annotations (see figure). This functionality is currently integrated into other CCP4 programs, such as the molecular graphics program CCP4mg (through the Glycoblocks 3D representation, which conforms to the standard symbol nomenclature) and the suite's graphical interface, CCP4i2.

Software and websites

  • rcsbPDB validation/deposition site
  • MolProbity web service
  • Protein structure validation database PDBREPORT.
  • EDS (Electron Density Server)[32][32][26]
  • What_Check software
  • ProCheck software
  • Privateer (carbohydrate validation)
  • Coot modeling software (built-in validation)[33][33][27]
  • OOPS2, part of the Uppsala Software Factory
  • ProSA web service
  • Verify-3D profile analysis
  • PDB_REDO optimized X-ray structure models[34][34][28]
  • PROSESS - Protein Structure Evaluation Suite & Server
  • Resolution by Proxy, ResProx - protein model resolution-by-proxy
  • VADAR - Volume, Area, Dihedral Angle Reporter
  • Data Validation: Chemical Shifts, NOEs, RDCs

    AVS. Assignment validation suite (AVS) checks the chemical shifts list in BioMagResBank (BMRB) format for problems.

    PSVS (Protein Structure Validation Server at the NESG)

    PROSESS. PROSESS (Protein Structure Evaluation Suite & Server) is a new web server that offers an assessment of protein structural models by NMR chemical shifts as well as NOEs, geometrical, and knowledge-based parameters.

    LACS. Linear analysis of chemical shifts is used for absolute referencing of chemical shift data.

    Model-to-data validation

    TALOS+. Predicts protein backbone torsion angles from chemical shift data. Frequently used to generate further restraints applied to a structure model during refinement.

    Dynamics: core vs loops, tails, and mobile domains

    One of the critical needs for NMR structural ensemble validation is to distinguish well-determined regions (those that have experimental data) from regions that are highly mobile and/or have no observed data. There are several current or proposed methods for making this distinction such as Random Coil Index, but so far the NMR community has not standardized on one.

    Software and websites

  • PSVS (Protein Structure Validation Server at the NESG)[35][35][29]
  • CING (Common Interface for NMR structure Generation) software
  • ProCheckNMR software[36][36][30]
  • TALOS+ Software & Server(server for predicting protein backbone torsion angles from chemical shift)
  • VADAR - Volume, Area, Dihedral Angle Reporter
  • PROSESS - Protein Structure Evaluation Suite & Server
  • ResProx - protein model resolution-by-proxy
  • MolProbity (includes analyses for NMR)
  • Software and websites

  • EM Data Bank, for EM map deposition
  • EMDB at the PDB, info on ftp download of maps
  • For SAXS (small-angle x-ray scattering)

    SAXS is a rapidly growing area of structure determination, both as a source of approximate 3D structure for initial or difficult cases and as a component of hybrid-method structure determination when combined with NMR, EM, crystallographic, cross-linking, or computational information. There is great interest in the development of reliable validation standards for SAXS data interpretation and for quality of the resulting models, but there are as yet no established methods in general use. Three recent steps in this direction are the creation of a Small-Angle Scattering Validation Task Force committee by the worldwide Protein DataBank and its initial report, a set of suggested standards for data inclusion in publications, and an initial proposal of statistically derived criteria for automated quality evaluation.

    For computational biology

    It is difficult to do meaningful validation of an individual, purely computational, macromolecular model in the absence of experimental data for that molecule, because the model with the best geometry and conformational score may not be the one closest to the right answer. Therefore, much of the emphasis in validation of computational modeling is in assessment of the methods. To avoid bias and wishful thinking, double-blind prediction competitions have been organized, the original example of which (held every 2 years since 1994) is CASP (Critical Assessment of Structure Prediction) to evaluate predictions of 3D protein structure for newly solved crystallographic or NMR structures held in confidence until the end of the relevant competition. The major criterion for CASP evaluation is a weighted score called GDT-TS for the match of Calpha positions between the predicted and the experimental models.

    Software and websites

  • CASP experiments home page
  • Model validation in Yasara
  • References

    Structure validation Wikipedia