Standards > model/format > bsg-s000255

ready Protein Data Bank Format


General Information
An exchange format for reporting experimentally determined three-dimensional structures of biological macromolecules that serves a global community of researchers, educators, and students. The data contained in the archive include atomic coordinates, bibliographic citations, primary and secondary structure, information, and crystallographic structure factors and NMR experimental data.

Record updated: Oct. 22, 2016, 11:01 a.m. by The FAIRsharing Team.



Additional Information


    No tools defined


Access / Retrieve Data

Conditions of Use

Related Standards

Terminology Artifacts

No semantic standards defined

Models and Formats

No syntax standards defined

Implementing Databases (58)
CAPS-DB : a structural classification of helix-capping motifs
CAPS-DB is a structural classification of helix-cappings or caps compiled from protein structures. Caps extracted from protein structures have been structurally classified based on geometry and conformation and organized in a tree-like hierarchical classification where the different levels correspond to different properties of the caps.

Database of Aligned Ribosomal Complexes
The Database for Aligned Ribosomal Complexes (DARC) site provides a resource for directly comparing the structures of available ribosomal complexes.

EcoliWiki: A Wiki-based community resource for Escherichia coli
Community-based resource for the annotation of all non-pathogenic E. coli, its phages, plasmids, and mobile genetic elements.

FunTree: A Resource For Exploring The Functional Evolution Of Structurally Defined Enzyme Superfamilies
A resource for exploring the evolution of protein function through relationships in sequence, structure, phylogeny and function.

Database of interaction Hotspots across the proteome. Hot spots are energetically important residues at protein interfaces and they are not randomly distributed across the interface but rather clustered. These clustered hot spots form hot regions. HotRegion, provides information of these interfaces by using predicted hot spot residues, and structural properties of these interface residues such as pair potentials of interface residues, accessible surface area (ASA) and relative ASA values of interface residues of both monomer and complex forms of proteins. Also, the 3D visualization of the interface and interactions among hot spot residues are provided.

InterEvol database : Diving into the structure and evolution of protein complex interfaces
Evolution of protein-protein Interfaces InterEvol is a resource for researchers to investigate the structural interaction of protein molecules and sequences using a variety of tools and resources.

MINAS - A Database of Metal Ions in Nucleic AcidS
MINAS contains the exact geometric information on the first and second-shell coordinating ligands of every metal ion present in nucleic acid structures that are deposited in the PDB and NDB. Containing also the sequence information of the binding pocket-proximal nucleotides, this database allows for a detailed search of all combinations of potential ligands and of coordination environments of metal ions. MINAS is therefore a perfect new tool to classify metal ion binding pockets in nucleic acids by statistics and to draw general conclusions about the different coordination properties of these ions.

MIPModDB: A Central Resource for the Superfamily of Major Intrinsic Proteins
This is a database of comparative protein structure models of MIP (Major Intrinsic Protein) family of proteins. The nearly completed sets of MIPs have been identified from the completed genome sequence of organisms available at NCBI. The structural models of MIP proteins were created by defined protocol. The database aims to provide key information of MIPs in particular based on sequence as well as structures. This will further help to decipher the function of uncharacterized MIPs.

Prokaryotic Glycoproteins Database
ProGlycProt (Prokaryotic Glycoproteins) is a manually curated, comprehensive repository of experimentally characterized eubacterial and archaeal glycoproteins, generated from an exhaustive literature search. This is the focused beginning of an effort to provide concise relevant information derived from rapidly expanding literature on prokaryotic glycoproteins, their glycosylating enzyme(s), glycosylation linked genes, and genomic context thereof, in a cross-referenced manner.

Protein-Chemical Structural Interactions
Protein-Chemical Structural Interactions provides information on the 3-dimensional chemical structures of protein interactions with low molecular weight.

Protein Structure Change Database
The Protein Structural Change DataBase (PSCDB) presents the structural changes found in proteins, represented by pairs of ligand-free and ligand-bound structures of identical proteins, and links these changes to ligand-binding.

SNPeffect 4.0: on-line prediction of molecular and structural effects of protein-coding variants
SNPeffect is a database for phenotyping human single nucleotide polymorphisms (SNPs). SNPeffect primarily focuses on the molecular characterization and annotation of disease and polymorphism variants in the human proteome. Further, SNPeffect holds per-variant annotations on functional sites, structural features and post-translational modification.

Statistical Torsional Angles Potentials of NMR Refinement Database
The STAP database contains refined versions of the NMR structures deposited in PDB. These refinements have been performed using statistical torsion angle potential and structurally- or experimentally- derived distance potential. The refined structures have a significantly improved structural quality compared to their initial NMR structure.

Compilation and Creation of datasets from PDB
ccPDB (Compilation and Creation of datasets from PDB) is a collection of commonly used data sets for structural or functional annotation of proteins. There are numerous datasets from the literature and the Protein Data Bank (PDB), which were used for developing methods to annotate proteins at the sequence (or residue) level. A tool is available for creating a wide range of customized data sets from PDB.

Death Domain Database
Death Domain Database is a manually curated database of protein-protein interactions for Death Domain Superfamily.

Indel Flanking Region Database
Indel Flanking Region Database is an online resource for indels (insertion/deletions) and the flanking regions of proteins in SCOP superfamilies. It aims at providing a comprehensive dataset for analyzing the qualities of amino acid indels, substitutions and the relationship between them.

Validated NMR structures of proteins and nucleic acids.

Pocket Similarity Search using Multiple-Sketches
POcket Similarity Search Using Multiple-Sketches (PoSSuM) includes all the discovered protein-small molecule binding site pairs with annotations of various types (e.g., UniProt, CATH, SCOP, SCOPe, EC number and Gene ontology). PoSSuM enables rapid exploration of similar binding sites among structures with different global folds as well as similar folds. Moreover, PoSSuM is useful for predicting the binding ligand for unbound structures.

SitEx database of eukaryotic protein functional sites
SitEx is a database containing information on eukaryotic protein functional sites. It stores the amino acid sequence positions in the functional site, in relation to the exon structure of encoding gene This can be used to detect the exons involved in shuffling in protein evolution, or to design protein-engineering experiments.

PASS2 contains alignments of structural motifs of protein superfamilies. PASS2 is an automatic version of the original superfamily alignment database, CAMPASS (CAMbridge database of Protein Alignments organised as Structural Superfamilies). PASS2 contains alignments of protein structures at the superfamily level and is in direct correspondence with SCOPe 2.04 release.

Drug-related information: medical indications, adverse drug effects, drug metabolism and Gene Ontology terms of the target proteins.

Virus Pathogen Database and Analysis Resource
The Virus Pathogen Database and Analysis Resource (ViPR) is an integrated repository of data and analysis tools for multiple virus families, supported by the National Institute of Allergy and Infectious Diseases (NIAID) Bioinformatics Resource Centers (BRC) program. ViPR captures various types of information, including sequence records, gene and protein annotations, 3D protein structures, immune epitope locations, clinical and surveillance metadata and novel data derived from comparative genomics analysis. The database is available without charge as a service to the virology research community to help facilitate the development of diagnostics, prophylactics and therapeutics for priority pathogens and other viruses.

Influenza Research Database
The Influenza Research Database (IRD) is a free, open, publicly-accessible resource funded by the U.S. National Institute of Allergy and Infectious Diseases through the Bioinformatics Resource Centers program. IRD provides a comprehensive, integrated database and analysis resource for influenza sequence, surveillance, and research data, including user-friendly interfaces for data retrieval, visualization, and comparative genomics analysis, together with personal login- protected ‘workbench’ spaces for saving data sets and analysis results. IRD integrates genomic, proteomic, immune epitope, and surveillance data from a variety of sources, including public databases, computational algorithms, external research groups, and the scientific literature.

Telomerase Database
The Telomerase Database is a Web-based tool for the study of structure, function, and evolution of the telomerase ribonucleoprotein. The objective of this database is to serve the research community by providing a comprehensive compilation of information known about telomerase enzyme and its substrate, telomeres.

Functional Coverage of the Proteome
FCP is a publicly accessible web tool dedicated to analysing the current state and trends on the population of available structures along the classification schemes of enzymes and nuclear receptors, offering both graphical and quantitative data on the degree of functional coverage in that portion of the proteome by existing structures, as well as on the bias observed in the distribution of those structures among proteins.

Evolutionary Trace
Relative evolutionary importance of amino acids within a protein sequence.

SURFACE is a database containing the results of a large-scale protein annotation and local structural comparison project. The homepage of the resource has not been updated since 2003 (and the maintainer's website since 2010). Until we have confirmation of the status of this project, we have classified it as uncertain.

Chemical Component Dictionary
The Chemical Component Dictionary is an external reference file describing all residue and small molecule components found in Protein Data Bank entries. It contains detailed chemical descriptions for standard and modified amino acids/nucleotides, small molecule ligands, and solvent molecules. Each chemical definition includes descriptions of chemical properties such as stereochemical assignments, aromatic bond assignments, idealized coordinates, chemical descriptors (SMILES & InChI), and systematic chemical names.

Catalytic Site Atlas
The Catalytic Site Atlas (CSA) is a database documenting enzyme active sites and catalytic residues in enzymes of 3D structure. It uses a defined classification for catalytic residues which includes only those residues thought to be directly involved in some aspect of the reaction catalysed by an enzyme.

Protein Data Bank in Europe
The Protein Data Bank in Europe is a founding member of the worldwide Protein Data Bank which collects, organises and disseminates data on biological macromolecular structures.

PDBsum; at-a-glance overview of macromolecular structures
PDBsum provides an overview of every macromolecular structure deposited in the Protein Data Bank (PDB), giving schematic diagrams of the molecules in each structure and of the interactions between them.

Protein Data Bank: Proteins, Interfaces, Structures and Assemblies
The Protein Quaternary Structure file server (PDBePISA) is an internet resource that makes available coordinates for likely quaternary states for structures contained in the Brookhaven Protein Data Bank (PDB) that were determined by X-ray crystallography.

PROCOGNATE is a database of cognate ligands for the domains of enzyme structures in CATH, SCOP and Pfam. The database contains an assignment of PDB ligands to the domains of structures as classified by the CATH, SCOP and Pfam databases. Cognate ligands have been identified using data from the ENZYME and KEGG databases and compared to the PDB ligand using graph matching to assess chemical similarity. Cognate ligands from the known reactions in ENZYME and KEGG for a particular enzyme are then assigned to enzymes structures which have EC numbers.

Protein Classification Benchmark Collection
The Protein Classification Benchmark Collection was created in order to create standard datasets on which the performance of machine learning methods can be compared.

ArchDB is a compilation of structural classifications of loops extracted from known protein structures. The structural classification is based on the geometry and conformation of the loop. The geometry is defined by four internal variables and the type of regular flanking secondary structures, resulting in 10 different loop types. Loops in ArchDB have been classified using an improved version (Espadaler et al.) of the original ArchType program published in 1997 by Oliva et al.

PSIbase is a molecular interaction database based on PSIMAP (PDB, SCOP) that focuses on structural interaction of proteins and their domains

Protein Model Database
The Protein Model DataBase (PMDB), is a database that collects manually built three dimensional protein models, obtained by different structure prediction techniques.

SUPERFAMILY is a database of structural and functional annotation for all proteins and genomes.

Structural Classification Of Proteins
The SCOP database is a curated both manually and with the use of automated tools. This freely available resource aims to provide a comprehensive description of the structural and evolutionary relationships between all proteins whose structure is known.

Molecular Modeling Database (MMDB)
The Molecular Modeling Database (MMDB), as part of the Entrez system, facilitates access to structure data by connecting them with associated literature, protein and nucleic acid sequences, chemicals, biomolecular interactions, and more.

Transporter Classification Database
The database details a comprehensive IUBMB approved classification system for membrane transport proteins known as the Transporter Classification (TC) system. The TC system is analogous to the Enzyme Commission (EC) system for classification of enzymes, except that it incorporates both functional and phylogenetic information. Descriptions, TC numbers, and examples of over 600 families of transport proteins are provided. Transport systems are classified on the basis of five criteria, and each of these criteria corresponds to one of the five numbers or letters within the TC# for a particular type of transporter.

Protein Data Bank Japan
The Protein Data Bank is the single worldwide archive of structural data of biological macromolecules.

Ligand Expo
Ligand Expo is a data resource for finding information about small molecules bound to proteins and nucleic acids.

RCSB Protein Data Bank
This resource is powered by the Protein Data Bank archive-information about the 3D shapes of proteins, nucleic acids, and complex assemblies that helps students and researchers understand all aspects of biomedicine and agriculture, from protein synthesis to health and disease. As a member of the wwPDB, the RCSB PDB curates and annotates PDB data. The RCSB PDB builds upon the data by creating tools and resources for research and education in molecular biology, structural biology, computational biology, and beyond.

TargetTrack, a target registration database, provides information on the experimental progress and status of targets selected for structure determination.

Nucleic Acids Database
The NDB contains information about experimentally-determined nucleic acids and complex assemblies. NDB can be used to perform searches based on annotations relating to sequence, structure and function, and to download, analyze, and learn about nucleic acids.

Sanger Pfam Mirror
The Pfam database contains information about protein domains and families. For each entry a protein sequence alignment and a Hidden Markov Model is stored.

3D interacting domains
The database of 3D Interaction Domains (3did) is a collection of domain-domain interactions in proteins for which high-resolution three-dimensional structures are known. 3did exploits structural information to provide critical molecular details necessary for understanding how interactions occur.

SWISS-MODEL Repository of 3D protein structure models
The SWISS-MODEL Repository is a database of annotated 3D protein structure models generated by the SWISS-MODEL homology-modelling pipeline for protein sequences of selected model organisms.

CATH Protein Structure Classification
The CATH database is a hierarchical domain classification of protein structures in the Protein Data Bank. Protein structures are classified using a combination of automated and manual procedures. There are four major levels in this hierarchy; Class (secondary structure classification, e.g. mostly alpha), Architecture (classification based on overall shape), Topology (fold family) and Homologous superfamily (protein domains which are thought to share a common ancestor). This collection is concerned with superfamily classification.

Biological Magnetic Resonance Databank
BMRB collects, annotates, archives, and disseminates (worldwide in the public domain) the important spectral and quantitative data derived from NMR spectroscopic investigations of biological macromolecules and metabolites. The goal is to empower scientists in their analysis of the structure, dynamics, and chemistry of biological systems and to support further development of the field of biomolecular NMR spectroscopy.

Electron Microscopy Data Bank
Cryo-electron microscopy reconstruction methods are uniquely able to reveal structures of many important macromolecules and macromolecular complexes. The Electron Microscopy Data Bank (EMDB) is a public repository for electron microscopy density maps of macromolecular complexes and subcellular structures. It covers a variety of techniques, including single-particle analysis, electron tomography, and electron (2D) crystallography. The EMDB was founded at EBI in 2002, under the leadership of Kim Henrick. Since 2007 it has been operated jointly by the PDBe, and the Research Collaboratory for Structural Bioinformatics (RCSB PDB) as a part of EMDataBank which is funded by a joint NIH grant to PDBe, the RCSB and the National Center for Macromolecular Imaging (NCMI).

KnotProt: A database of proteins with knots and slipknots
KnotProt collects information about proteins with knots or slipknots. The knotting complexity of proteins is presented in the form of a matrix diagram that shows users the knot type of the entire polypeptide chain and of each of its subchains. The database presents extensive information about the biological function of proteins with non-trivial knotting and enables users to analyze new structures.

A database of protein disorder and mobility annotations. MobiDB was designed to offer a centralized resource for annotations of intrinsic protein disorder. The database features three levels of annotation: manually curated, indirect and predicted. Manually curated data is extracted from the DisProt database. Indirect data is inferred from PDB structures that are considered an indication of intrinsic disorder. The 10 predictors currently included (three ESpritz flavors, two IUPred flavors, two DisEMBL flavors, GlobPlot, VSL2b and JRONN) enable MobiDB to provide disorder annotations for every protein in absence of more reliable data. The new version also features a consensus annotation and classification for long disordered regions. In order to complement the disorder annotations, MobiDB features additional annotations from external sources. Annotations from the UniProt database include post-translational modifications and linear motifs. Pfam annotations are displayed in graphical form and are link-enabled, allowing the user to visit the corresponding Pfam page for further information. Experimental protein–protein interactions are also classified for disorder content.

RepeatsDB ( is a database of annotated tandem repeat protein structures. Tandem repeats pose a difficult problem for the analysis of protein structures, as the underlying sequence can be highly degenerate. Several repeat types haven been studied over the years, but their annotation was done in a case-by-case basis, thus making large-scale analysis difficult. We developed RepeatsDB to fill this gap. Using state-of-the-art repeat detection methods and manual curation, we systematically annotated the Protein Data Bank, predicting 10 745 repeat structures. In all, 2797 structures were classified according to a recently proposed classification schema, which was expanded to accommodate new findings. In addition, detailed annotations were performed in a subset of 321 proteins. These annotations feature information on start and end positions for the repeat regions and units. RepeatsDB is an ongoing effort to systematically classify and annotate structural protein repeats in a consistent way. It provides users with the possibility to access and download high-quality datasets either interactively or programmatically through web services.

Worldwide Protein Data Bank
The Worldwide PDB (wwPDB) organization manages the PDB archive and ensures that the PDB is freely and publicly available to the global community. The mission of the wwPDB is to maintain a single Protein Data Bank Archive of macromolecular structural data that is freely and publicly available to the global community. The wwPDB is composed of the RCSB PDB, PDBe, PDBj and BMRB.

Model Archive - theoretical models of macromolecular structures
The Model Archive provides a stable archive for computational macro-molecular models published in the scientific literature. The model archive provides a unique stable accession code (DOI) for each deposited model, which can be directly referenced in the corresponding manuscripts. Model Archive is part of the Protein Model Portal (

LinkProt: A database of proteins with topological links
LinkProt collects information about protein chains and complexes that form links. LinkProt detects deterministic links (with loops closed by cysteine), and determines likelihood of formation of links in networks of protein chains called MacroLinks. Links are presented graphically in an intuitive way, using tools that involves surfaces of minimal area spanned on closed loops. The database presents extensive information about biological functions of proteins with links and enables users to analyze new structures.

Scroll for more...

Implementing Policies

This record is not implemented by any policy.


Record Maintainer

  • This record is in need of a maintainer. If you login, you'll be able to claim this record.



Announcing the worldwide Protein Data Bank.

Berman H., Henrick K., Nakamura H.,
Nat. Struct. Biol. 2003

View Paper (PubMed) View Paper (DOI)