Samiksha Jaiswal (Editor)

Biological database

Updated on
Edit
Like
Comment
Share on FacebookTweet on TwitterShare on LinkedInShare on Reddit
Biological database

Biological databases are libraries of life sciences information, collected from scientific experiments, published literature, high-throughput experiment technology, and computational analysis. They contain information from research areas including genomics, proteomics, metabolomics, microarray gene expression, and phylogenetics. Information contained in biological databases includes gene function, structure, localization (both cellular and chromosomal), clinical effects of mutations as well as similarities of biological sequences and structures.

Contents

Biological databases can be broadly classified into sequence, structure and functional databases. Nucleic acid and protein sequences are stored in sequence databases and structure databases store solved structures of RNA and proteins. Functional databases provide information on the physiological role of gene products, for example enzyme activities, mutant phenotypes, or biological pathways. Model Organism Databases are functional databases that provide species-specific data. Databases are important tools in assisting scientists to analyze and explain a host of biological phenomena from the structure of biomolecules and their interaction, to the whole metabolism of organisms and to understanding the evolution of species. This knowledge helps facilitate the fight against diseases, assists in the development of medications, predicting certain genetic diseases and in discovering basic relationships among species in the history of life.

Biological knowledge is distributed among many different general and specialized databases. This sometimes makes it difficult to ensure the consistency of information. Integrative bioinformatics is one field attempting to tackle this problem by providing unified access. One solution is how biological databases cross-reference to other databases with accession numbers to link their related knowledge together.

Relational database concepts of computer science and Information retrieval concepts of digital libraries are important for understanding biological databases. Biological database design, development, and long-term management is a core area of the discipline of bioinformatics. Data contents include gene sequences, textual descriptions, attributes and ontology classifications, citations, and tabular data. These are often described as semi-structured data, and can be represented as tables, key delimited records, and XML structures.

Types of Biological Databases

There are two common concepts of biological databases: Primary Databases and Secondary Databases. These two differ in their archive structure. Primary databases often hold only one type of specific data which is stored in their own archive. They upload new data explored in experiments and update entries to ensure the quality of the data.

Secondary databases are databases, which use other databases as their source of information, thus they get their data by requesting other databases. They often already process or analyze the data matching the corresponding request to get new results.

Nucleic Acids Research Database Issue

An important resource for finding biological databases is a special yearly issue of the journal Nucleic Acids Research (NAR). The Database Issue of NAR is freely available, and categorizes many of the publicly available on line databases related to biology and bioinformatics. A companion database to the issue called the Online Molecular Biology Database Collection lists 1,380 online databases. Other collections of databases exist such as MetaBase and the Bioinformatics Links Collection.

Access

Most biological databases are available through web sites that organise data such that users can browse through the data online. In addition the underlying data is usually available for download in a variety of formats. Biological data comes in many formats. These formats include text, sequence data, protein structure and links. Each of these can be found from certain sources, for example:

  • Text formats are provided by PubMed and OMIM.
  • Sequence data is provided by GenBank, in terms of DNA, and UniProt, in terms of protein.
  • Protein structures are provided by PDB, SCOP, and CATH.
  • Species-specific databases

    Species-specific databases are available for some species, mainly those that are often used in research (Model Organisms). For example, EcoCyc is an E. coli database. Other popular Model Organism Databases include Mouse Genome Informatics for the laboratory mouse, Mus musculus, the Rat Genome Database for Rattus, ZFIN for Danio Rerio (zebrafish), PomBase for the fission yeast Schizosaccharomyces pombe, FlyBase for Drosophila, WormBase for the nematodes Caenorhabditis elegans and Caenorhabditis briggsae, and Xenbase for Xenopus tropicalis and Xenopus laevis frogs.

    References

    Biological database Wikipedia