FAIRsharing is here! From our first incarnation, BioSharing.org, which focussed on the life sciences, we are growing into FAIRsharing.org, to serve users across all disciplines.
Standards > model/format > bsg-s000228

ready FASTA Sequence Format

General Information
FASTA format is a text-based format for representing either nucleotide sequences or peptide sequences, in which nucleotides or amino acids are represented using single-letter codes. The format also allows for sequence names and comments to precede the sequences.

This record is maintained by wrpearson  ORCID

Record updated: Aug. 5, 2016, 2:39 p.m. by The FAIRsharing Team.



    No tools defined


No XSD schemas defined

Access / Retrieve Data

Conditions of Use

Related Standards

Reporting Guidelines

No guidelines defined

Terminology Artifacts

No semantic standards defined

Implementing Databases (252)
GenBank is the NIH genetic sequence database, an annotated collection of all publicly available DNA sequences. GenBank is part of the International Nucleotide Sequence Database Collaboration, which comprises the DNA DataBank of Japan (DDBJ), the European Molecular Biology Laboratory (EMBL), and GenBank at NCBI. These three organizations exchange data on a daily basis. The complete release notes for the current version of GenBank are available on the NCBI ftp site. A new release is made every two months. GenBank growth statistics for both the traditional GenBank divisions and the WGS division are available from each release.

Animal Transcription Factor Database
AnimalTFDB is a comprehensive animal transcription factor database. The resource is classification of transcription factors from 50 genomes from species including Homo sapiens and Caenorhabditis elegans. The database also has information on co-transcription factors and chromatin remodelling factors.

Apo and Holo structures DataBase
AH-DB (Apo and Holo structures DataBase) collects the apo and holo structure pairs of proteins. Proteins are frequently associated with other molecules to perform their functions. Experimental structures determined in the bound state are named holo structures; while structures determined in the unbound state are named apo structures.

Aspergillus Genome Database
The Aspergillus Genome Database is a resource for genomic sequence data as well as gene and protein information for Aspergilli. This publicly available repository is a central point of access to genome, transcriptome and polymorphism data for the fungal research community.

Bitter Compounds Database
BitterDB is a free and searchable database of bitter compounds. Compounds can be searched by name, chemical structure, similarity to other bitter compounds, association with a particular human bitter taste receptor, and so on. The database also contains information on mutations in bitter taste re- ceptors that were shown to influence receptor activation by bitter compounds. The aim of BitterDB is to facilitate studying the chemical features associated with bitterness.

Bacterial protein tYrosine Kinase database
The Bacterial protein tYrosine Kinase database (BYKdb) contains computer-annotated BY-kinase sequences. The database web interface allows static and dynamic queries and provides integrated analysis tools including sequence annotation.

Central Aspergillus Data REpository
This project aims to support the international Aspergillus research community by gathering all genomic information regarding this significant genus into one resource - The Central Aspergillus REsource (CADRE). CADRE facilitates visualisation and analyses of data using the Ensembl software suite. Much of our data has been extracted from Genbank and augmented with the consent of the original sequencing groups. This additional work has been carried out using both automated and manual efforts, with support from specific annotation projects and the general Aspergillus community.

CAPS-DB : a structural classification of helix-capping motifs
CAPS-DB is a structural classification of helix-cappings or caps compiled from protein structures. Caps extracted from protein structures have been structurally classified based on geometry and conformation and organized in a tree-like hierarchical classification where the different levels correspond to different properties of the caps.

ChEMBL: a large-scale bioactivity database for drug discovery
ChEMBL is an open large-scale bioactivity database containing information largely manually extracted from the medicinal chemistry literature. Information regarding the compounds tested (including their structures), the biological or physicochemical assays performed on these and the targets of these assays are recorded in a structured form, allowing users to address a broad range of drug discovery questions.

ConoServer is a database specializing in sequences and structures of peptides expressed by marine cone snails.

CoryneRegNet 6.0 - Corynebacterial Regulation Network
Corynebacterial Regulation Network a reference database and analysis platform for corynebacterial transcription factors and gene regulatory networks.

Dragon Antimicrobial Peptide Database
Dragon Antimicrobial Peptide Database is a manually curated database of known and putative antimicrobial peptides (AMPs). It covers both prokaryotes and eukaryotes organisms.

DataBase of Transcriptional Start Sites
This database includes TSS data from adult and embryonic human tissue. DBTSS now contains 491 million TSS tag sequences for collected from a total of 20 tissues and 7 cell cultures.

The DNA Data Bank of Japan
Annotated collection of all publicly available nucleotide and protein sequences. In Japan, DDBJ Center internationally contributes as a member of INSDC to collect and to provide nucleotide sequence data with ENA/EBI in Europe and NCBI in USA. DDBJ collects sequence data mainly from Japanese researchers, as well as researchers in any other countries. Ninety-nine percent of INSD data from Japanese researchers are submitted through DDBJ.

Human Disease-Related Viral Integration Sites
Dr.VIS collects and locates human disease-related viral integration sites. So far, about 600 sites covering 5 virus organisms and 11 human diseases are available. Integration sites in Dr.VIS are located against chromesome, cytoband, gene and refseq position as specific as possible. Viral-cellular junction sequences are extracted from papers and nucleotide databases, and linked to cooresponding integration sites Graphic views summarizing distribution of viral integration sites are generated according to chromosome maps.

EcoliWiki: A Wiki-based community resource for Escherichia coli
Community-based resource for the annotation of all non-pathogenic E. coli, its phages, plasmids, and mobile genetic elements.

Evolutionary Genealogy of Genes: Non-supervised Orthologous Groups
eggNOG (evolutionary genealogy of genes: Non-supervised Orthologous Groups) is a database of orthologous groups of genes. The orthologous groups are annotated with functional description lines (derived by identifying a common denominator for the genes based on their various annotations), with functional categories (i.e derived from the original COG/KOG categories).

Eukaryotic Linear Motifs
This computational biology resource mainly focuses on annotation and detection of eukaryotic linear motifs (ELMs) by providing both a repository of annotated motif data and an exploratory tool for motif prediction. ELMs, or short linear motifs (SLiMs), are compact protein interaction sites composed of short stretches of adjacent amino acids.

Genetic, genomic and molecular information pertaining to the model organism Drosophila melanogaster and related sequences. This database also contains information relating to human disease models in Drosophila, the use of transgenic constructs containing sequence from other organisms in Drosophila, and information on where to buy Drosophila strains and constructs.

DRSC Functional Genomics Resources
DRSC Functional Genomics Resources (DRSC-FGR) began as the Drosophila RNAi Screening Center (DRSC), founded by Prof. Norbert Perrimon in 2003, and the Transgenic RNAi Project (TRiP), founded by Prof. Perrimon in 2008. DRSC-FGR has been previously known as flyRNAi.org. It has since grown into a functional genomics platform meeting the needs of the Drosophila and broader community.

Fungal and Oomycete genomics resource
FungiDB is an integrated genomic and functional genomic database for the kingdom Fungi. The database integrates whole genome sequence and annotation and also includes experimental and environmental isolate sequence data. The database includes comparative genomics, analysis of gene expression, and supplemental bioinformatics analyses and a web interface for data-mining.

FunTree: A Resource For Exploring The Functional Evolution Of Structurally Defined Enzyme Superfamilies
A resource for exploring the evolution of protein function through relationships in sequence, structure, phylogeny and function.

GABI-Kat SimpleSearch
T-DNA insertions in Arabidopsis and their flanking sequence tags.

GeneDB is a genome database for prokaryotic and eukaryotic organisms and provides a portal through which data generated by the "Pathogen Genomics" group at the Wellcome Trust Sanger Institute and other collaborating sequencing centres can be accessed.

Human Gene and Protein Database
Human Gene and Protein Database (HGPD) presents SDS-PAGE patterns and other informations of human genes and proteins.

Human Histone Database
HIstome (Human histone database) is a freely available, specialist, electronic database dedicated to display information about human histone variants, sites of their post-translational modifications and about various histone modifying enzymes.

Integrative and Conjugative Elements in Bacteria
A web-based resource for integrative and conjugative elements (ICEs) found in bacteria. It collates available data from experimental and bioinformatics analyses, and literature, about known and putative ICEs in bacteria as a PostgreSQL-based database called ICEberg. This database contains detailed information on all archived ICEs and the genes carried by each entity, including unique identifiers, species details and hyperlink-paths to other public databases, like NCBI, UniprotKB and KEGG.

Intrinsically Disordered proteins with Extensive Annotations and Literature
IDEAL provides a collection of knowledge on experimentally verified intrinsically disordered proteins (IDPs) or intrinsically disordered regions (IDRs). IDEAL contains manually curated annotations on IDPs in locations, structures, and functional sites such as protein binding regions and posttranslational modification sites together with references and structural domain assignments.

IMG/M: the integrated metagenome data management and comparative analysis system
Data management and analysis system for metagenomes

InterEvol database : Diving into the structure and evolution of protein complex interfaces
Evolution of protein-protein Interfaces InterEvol is a resource for researchers to investigate the structural interaction of protein molecules and sequences using a variety of tools and resources.

The LegumeIP 2.0 database hosts large-scale genomics and transcriptomics data and provides integrative bioinformatics tools for the study of gene function and evolution in legumes.

This resource provides information primarily on the upstream non-coding sequence data of genes in 3 genomes which gives insight into the transcription factors binding sites (TFBSs). For each transcript, the region scanned extends from 10,000bp upstream of the transcript start to 50bp downstream of the coding sequence start. Therefore, the database contains putative binding sites in the gene promoter and in the initial introns and non-coding exons. Information displayed for each putative binding site includes the transcription factor name, its position (absolute on the chromosome, or relative to the gene), the score of the prediction, and the region of the gene the site belongs to. If the selected gene has homologs in any of the other two organisms, the program optionally displays the putative TFBSs in the homologs.

Minimotif Miner 3.0
A database of short functional motifs involved in posttranslational modifications, binding to other proteins, nucleic acids, or small molecules.

Molecular INTeraction database
MINT focuses on experimentally verified protein-protein interactions mined from the scientific literature by expert curators. As of September 2013, MINT uses the IntAct database infrastructure to limit the duplication of efforts and to optimise future software development. Data maintenance and release, MINT PSICQUIC and IMEx services are under the responsibility of the IntAct team, while curation effort will be carried by both groups. Data manually curated by the MINT curators can now also be accessed from the IntAct homepage at the EBI.

MIPModDB: A Central Resource for the Superfamily of Major Intrinsic Proteins
This is a database of comparative protein structure models of MIP (Major Intrinsic Protein) family of proteins. The nearly completed sets of MIPs have been identified from the completed genome sequence of organisms available at NCBI. The structural models of MIP proteins were created by defined protocol. The database aims to provide key information of MIPs in particular based on sequence as well as structures. This will further help to decipher the function of uncharacterized MIPs.

mirEX2 is a comprehensive platform for comparative analysis of primary microRNA expression data. RT–qPCR-based gene expression profiles are stored in a universal and expandable database scheme and wrapped by an intuitive user-friendly interface.

Integrated web resource of mitochondrial localisation evidence and phenotype data for mammals, zebrafish and yeasts.

modMine is an integrated web resource of data & tools to browse and search modENCODE data and experimental details, download results and access the GBrowse genome browser.

A comprehensive repository for omics data from the red spotted newt Notophthalmus viridescens from high throughput experiments. Newt-Omics aims to provide a comprehensive platform of expressed genes during tissue regeneration, including extensive annotations, expression data and experimentally verified peptide sequences with yet no homology to other publically available gene sequences. The goal is to obtain a detailed understanding of the molecular processes underlying tissue regeneration in the newt,that may lead to the development of approaches, efficiently stimulating regenerative pathways in mammalians.

NONCODE is a database of noncoding RNAs (except tRNAs and rRNAs). This resource is a comprehensive collection and annotation of noncoding RNAs including details of long noncoding (lnc) RNAs. The database provides access to data on the relationships of lncRNAs to disease, conservation annotation, and high-quality datasets.

Families of nuclear hormone receptors

Online GEne Essentiality database
Online GEne Essentiality database

GENI-ACT is a resource that allows the research community to collaboratively annotate bacterial genomes. Changes can be suggested to existing genomes and these alterations can be ported back to NCBI Genbank. GENI-ACT also has modules which can be used for educational purposes.

Pfam Protein Families
The Pfam database is a large collection of protein families, each represented by multiple sequence alignments and hidden Markov models (HMMs). Proteins are generally composed of one or more functional regions, commonly termed domains. Different combinations of domains give rise to the diverse range of proteins found in nature. The identification of domains that occur within proteins can therefore provide insights into their function. Pfam also generates higher-level groupings of related entries, known as clans. A clan is a collection of Pfam entries which are related by similarity of sequence, structure or profile-HMM.

Prokaryotic Glycoproteins Database
ProGlycProt (Prokaryotic Glycoproteins) is a manually curated, comprehensive repository of experimentally characterized eubacterial and archaeal glycoproteins, generated from an exhaustive literature search. This is the focused beginning of an effort to provide concise relevant information derived from rapidly expanding literature on prokaryotic glycoproteins, their glycosylating enzyme(s), glycosylation linked genes, and genomic context thereof, in a cross-referenced manner.

Protein-Chemical Structural Interactions
Protein-Chemical Structural Interactions provides information on the 3-dimensional chemical structures of protein interactions with low molecular weight.

ProtoNet 6.0: Organizing 10 million protein sequences into a compact hierarchical family tree
This resource is a hierarchical clustering of UniProt protein sequences into hierarchical trees. This resource allows for the study of sub-family and super-family of a protein, using UniRef50 clusters.

Saccharomyces Genome Database
The Saccharomyces Genome Database (SGD) collects and organizes information about the molecular biology and genetics of the yeast Saccharomyces cerevisiae. SGD contains a variety of biological information and tools with which to search and analyze it.

SMART 7: Recent updates to the protein domain annotation resource
Simple modular architecture research tool: signalling, extracellular and chromatin-associated protein domains

SNPeffect 4.0: on-line prediction of molecular and structural effects of protein-coding variants
SNPeffect is a database for phenotyping human single nucleotide polymorphisms (SNPs). SNPeffect primarily focuses on the molecular characterization and annotation of disease and polymorphism variants in the human proteome. Further, SNPeffect holds per-variant annotations on functional sites, structural features and post-translational modification.

Search Tool for Interactions of Chemicals
STITCH is a resource to explore known and predicted interactions of chemicals and proteins. Chemicals are linked to other chemicals and proteins by evidence derived from experiments, databases and the literature.

Group II introns database
Database for identification and cataloguing of group II introns. All bacterial introns listed in the main page are full-length and appear to be functional, based on intron RNA and IEP characteristics. The database names the full-length introns, and provides information on their boundaries, host genes, and secondary structures. In addition, the website provides tools for analysis that may be useful to researchers who encounter group II introns in DNA sequences.

Autism Knowledgebase
Autism genetics KnowledgeBase, an evidence-based knowledgebase of autism genetics.

BacMap is a picture atlas of annotated bacterial genomes. It is an interactive visual database containing hundreds of fully labeled, zoomable, and searchable maps of bacterial genomes.

Compilation and Creation of datasets from PDB
ccPDB (Compilation and Creation of datasets from PDB) is a collection of commonly used data sets for structural or functional annotation of proteins. There are numerous datasets from the literature and the Protein Data Bank (PDB), which were used for developing methods to annotate proteins at the sequence (or residue) level. A tool is available for creating a wide range of customized data sets from PDB.

Detection of functional divergence in human protein families. Cube-DB is a database of pre-evaluated conservation and specialization scores for residues in paralogous proteins belonging to multi-member families of human proteins. Protein family classification follows (largely) the classification suggested by HUGO Gene Nomenclature Committee. Sets of orthologous protein sequences were generated by mutual-best-hit strategy using full vertebrate genomes available in Ensembl. The scores, described on documentation page, are assigned to each individual residue in a protein, and presented in the form of a table (html or downloadable xls formats) and mapped, when appropriate, onto the related structure (Jmol, Pymol, Chimera).

Death Domain Database
Death Domain Database is a manually curated database of protein-protein interactions for Death Domain Superfamily.

GWASdb comprises of collections of traits/diseases associated SNP (TASs) from current GWAS and their comprehensive functional annotations, as well as disease classifications

miRNEST is an integrative collection of animal, plant and virus microRNA data.

MitoZoa is a specialized database collecting complete and nearly-complete (longer than 7 kb) mtDNA entries of metazoan species, excluding Placozoa. MitoZoa contains curated entries, whose gene annotation has been significantly improved using a semi-automatic reannotation pipeline and by manual curation of mitogenomics experts. MitoZoa has been specifically designed to address comparative analyses of mitochondrial genomic features in a given metazoan group or in species belonging to the same genus (congeneric species). MitoZoa focuses on mitochondrial gene order, non-coding regions, gene content, and gene sequences.

Nematode.net is the home page of the parasitic nematode EST project at The Genome Institute at Washington University in St. Louis. The site was established in 2000 as a component of the NIH-NIAID grant "A Genomic Approach to Parasites from the Phylum Nematoda". While Nematode.net started as a project site, over the years it became a community resource dedicated to the study of parasitic nematodes.

SitEx database of eukaryotic protein functional sites
SitEx is a database containing information on eukaryotic protein functional sites. It stores the amino acid sequence positions in the functional site, in relation to the exon structure of encoding gene This can be used to detect the exons involved in shuffling in protein evolution, or to design protein-engineering experiments.

The SpliceDisease database provides information linking RNA splicing to human disease, including the change of the nucleotide in the sequence, the location of the mutation on the gene, the reference Pubmed ID and detailed description for the relationship among gene mutations, splicing defects and diseases.

VIRsiRNAdb contains information on experimentally validated Viral siRNA/shRNA which target viral genome regions. It provides efficacy information where available, as well as the siRNA sequence, viral target and subtype, as well as the target genomic region.

Ebola and Hemorrhagic Fever Virus Database
The Ebola and Hemorrhagic Fever Virus Database stems from the Hemorrhagic Fever Viruses (HFV) Database Project founded by Dr. Carla Kuiken in 2009 at the Los Alamos National Laboratory (LANL). The HFV Database was modeled on the Los Alamos HIV Database, led by Dr. Bette Korber, and translated much of its tools, infrastructure and philosophy from HIV to HFV.

PASS2 contains alignments of structural motifs of protein superfamilies. PASS2 is an automatic version of the original superfamily alignment database, CAMPASS (CAMbridge database of Protein Alignments organised as Structural Superfamilies). PASS2 contains alignments of protein structures at the superfamily level and is in direct correspondence with SCOPe 2.04 release.

PomBase is a model organism database that provides organization of and access to scientific data for the fission yeast Schizosaccharomyces pombe. PomBase supports genomic sequence and features, genome-wide datasets and manual literature curation as well as providing structural and functional annotation and access to large-scale data sets.

Prokaryotic Operon DataBase
The Prokaryotic Operon DataBase (ProOpDB) constitutes one of the most precise and complete repository of operon predictions in our days. Using our novel and highly accurate operon algorithm, we have predicted the operon structures of more than 1,200 prokaryotic genomes.

TTD, Therapeutic Target Database
The Therapeutic Target Database provides information about therapeutic protein and nucleic acid targets, the targeted disease, pathway information and the corresponding drugs directed at each of these targets. Also included in this database are links to relevant databases containing information about target function, sequence, 3D structure, ligand binding properties, enzyme nomenclature and drug structure, therapeutic class, clinical development status. All information is fully referenced.

The UCSC Archaeal Genome Browser
The UCSC Archaeal Genome Browser is a window on the biology of more than 100 microbial species from the domain Archaea. Basic gene annotation is derived from NCBI Genbank/RefSeq entries, with overlays of sequence conservation across multiple species, nucleotide and protein motifs, non-coding RNA predictions, operon predictions, and other types of bioinformatic analyses. In addition, we display available gene expression data (microarray or high-throughput RNA sequencing). Direct contributions or notices of publication of functional genomic data or bioinformatic analyses from archaeal research labs are very welcome.

Virulence Factor Database
VFDB is an integrated and comprehensive database of virulence factors for bacterial pathogens (also including Chlamydia and Mycoplasma).

Virus Pathogen Database and Analysis Resource
The Virus Pathogen Database and Analysis Resource (ViPR) is an integrated repository of data and analysis tools for multiple virus families, supported by the National Institute of Allergy and Infectious Diseases (NIAID) Bioinformatics Resource Centers (BRC) program. ViPR captures various types of information, including sequence records, gene and protein annotations, 3D protein structures, immune epitope locations, clinical and surveillance metadata and novel data derived from comparative genomics analysis. The database is available without charge as a service to the virology research community to help facilitate the development of diagnostics, prophylactics and therapeutics for priority pathogens and other viruses.

The Yeast Metabolome DataBase
The Yeast Metabolome Database (YMDB) is a manually curated database of small molecule metabolites found in or produced by Saccharomyces cerevisiae (also known as Baker’s yeast and Brewer’s yeast). This database covers metabolites described in textbooks, scientific journals, metabolic reconstructions and other electronic databases.

A comprehensive online knowledgebase for the monkey research community.

Immuno Polymorphism Database - IMGT/HLA
The IPD-IMGT/HLA Database provides a specialist database for sequences of the human major histocompatibility complex (MHC) and includes the official sequences named by the WHO Nomenclature Committee For Factors of the HLA System. The IMGT/HLA Database was established to provide a locus-specific database (LSDB) for the allelic sequences of the genes in the HLA system, also known as the human MHC. The IMGT/HLA Database was first released in 1998 and subsequently incorporated as a module of IPD in 2012.

mycoCLAP is a searchable resource for the knowledge and annotation of Characterized Lignocellulose-Active Proteins of fungal origin.

Influenza Research Database
The Influenza Research Database (IRD) is a free, open, publicly-accessible resource funded by the U.S. National Institute of Allergy and Infectious Diseases through the Bioinformatics Resource Centers program. IRD provides a comprehensive, integrated database and analysis resource for influenza sequence, surveillance, and research data, including user-friendly interfaces for data retrieval, visualization, and comparative genomics analysis, together with personal login- protected ‘workbench’ spaces for saving data sets and analysis results. IRD integrates genomic, proteomic, immune epitope, and surveillance data from a variety of sources, including public databases, computational algorithms, external research groups, and the scientific literature.

Human Endogenous Retrovirus database
This database is compiled from the human genome nucleotide sequences obtained mostly in the Human Genome Projects. The database makes it possible to continuously improve classification and characterization of retroviral families. The HERV database now contains retroviruses from more than 90 % of the human genome.

Implemented the SNP discovery software autoSNP within a relational database to enable the efficient mining of the identified polymorphisms and the detailed interrogation of the data. AutoSNP was selected because it does not require sequence trace files and is thus applicable to a broader range of species and datasets.

miRNEST is a database of animal, plant and virus microRNAs, containing miRNA predictions conducted on Expressed Sequence Tags of animal and plant species.

Addgene is a non-profit plasmid repository dedicated to helping scientists around the world share high-quality plasmids. Addgene are working with thousands of laboratories to assemble a high-quality library of published plasmids for use in research and discovery. By linking plasmids with articles, scientists can always find data related to the materials they request.

Assembling the Fungal Tree of Life
The Assembling the Fungal Tree of Life (AFTOL) project is dedicated to significantly enhancing our understanding of the evolution of the Kingdom Fungi, which represents one of the major clades of life.

AgBase is a curated, open-source, Web-accessible resource for functional analysis of agricultural plant and animal gene products.

Databases of Orthologous Promoters
DoOP is a database of eukaryotic promoter sequences (upstream regions), aiming to facilitate the recognition of regulatory sites conserved between species. Based on the Arabidopsis thaliana and Homo sapiens genome annotation, this resource is also a collection of the orthologous promoter sequences from Viridiplantae and Chordata species. The database can be used to find promoter clusters of different genes as well as positions of the conserved regions and transcription start sites, which can be viewed graphically.

Allergome aims to supply information on Allergenic Molecules (Allergens) causing an IgE-mediated (allergic, atopic) disease (anaphylaxis, asthma, atopic dermatitis, conjunctivitis, rhinitis, urticaria). The resource is funded through the Allergen Data Laboratories via unrestricted grants from companies and institutions.

Evolutionary Annotation Database
Evola contains ortholog information of all human genes among vertebrates. Orthologs are a pair of genes in different species that evolved from a common ancestral gene by speciation. In Evola, orthologs were detected by comparative genomics and amino acid sequence analysis (Computational analysis).

DNASU Plasmid Repository
DNASU is a central repository for plasmid clones and collections. Currently we store and distribute over 197,000 plasmids including 75,000 human and mouse plasmids, full genome collections, the protein expression plasmids from the Protein Structure Initiative as the PSI: Biology Material Repository (PSI : Biology-MR), and both small and large collections from individual researchers. We are also a founding member and distributor of the ORFeome Collaboration plasmid collection.

Telomerase Database
The Telomerase Database is a Web-based tool for the study of structure, function, and evolution of the telomerase ribonucleoprotein. The objective of this database is to serve the research community by providing a comprehensive compilation of information known about telomerase enzyme and its substrate, telomeres.

Mammalian Protein Localization Database
LOCATE is a curated database that houses data describing the membrane organization and subcellular localization of proteins from the RIKEN FANTOM4 mouse and human protein sequence set.

Drosophila polymorphism database
Drosophila Polymorphism Database, is a secondary database designed to provide a collection of all the existing polymorphic sequences in the Drosophila genus. It allows, for the first time, the search for any polymorphic set according to different parameter values of nucleotide diversity.

Evolutionary Trace
Relative evolutionary importance of amino acids within a protein sequence.

Beijing Genomics Institute Rice Information System
In BGI-RIS, sequence contigs of Beijing indica and Syngenta japonica have been further assembled and anchored onto the rice chromosomes. The database has annotated the rice genomes for gene content, repetitive elements, and SNPs. Sequence polymorphisms between different rice subspecies have also been identified.

Chicken Variation Database
The chicken Variation Database (ChickVD) is an integrated information system for storage, retrieval, visualization and analysis of chicken variation data.

Influenza Virus Database
IVDB hosts complete genome sequences of influenza A virus generated by BGI and curates all other published influenza virus sequences after expert annotations. IVDB provides a series of tools and viewers for analyzing the viral genomes, genes, genetic polymorphisms and phylogenetic relationships comparatively.

Pig Genomic Informatics System
The Pig Genomic Informatics System (PigGIS) presents accurate pig gene annotations in all sequenced genomic regions. It integrates various available pig sequence data, including 3.84 million whole-genome-shortgun (WGS) reads and 0.7 million Expressed Sequence Tags (ESTs) generated by Sino-Danish Pig Genome Project, and 1 million miscellaneous GenBank records.

YanHuang - YH1 Genome Database
The YH database presents the entire DNA sequence of a Han Chinese individual, as a representative of Asian population. This genome, named as YH, is the start of YanHuang Project, which aims to sequence 100 Chinese individuals in 3 years.assembled based on 3.3 billion reads (117.7Gbp raw data) generated by Illumina Genome Analyzer. In total of 102.9Gbp nucleotides were mapped onto the NCBI human reference genome (Build 36) by self-developed software SOAP (Short Oligonucleotide Alignment Program), and 3.07 million SNPs were identified.

The Barcode of Life Data Systems
The Barcode of Life Data Systems (BOLD) is an online workbench that aids collection, management, analysis, and use of DNA barcodes. It consists of 3 components (MAS, IDS, and ECS) that each address the needs of various groups in the barcoding community.

Ciona intestinalis Protein Database
CIPRO is an integrated protein that has been developed to provide widespread information of the proteins expressed in the ascidian Ciona intestinalis, especially for the researcher who wants to get advance and useful information for starting biological and biomedical research. The protein information in CIPRO directly links to gene expression, a tool for peptide mass fingerprinting (PMF), intracellular localization, 3D image of early development, and transgenic resources.

PROkariotIC Database Of Gene-Regulation
PRODORIC is a comprehensive database about gene regulation and gene expression in prokaryotes. It includes a manually curated and unique collection of transcription factor binding sites.

National Microbial Pathogen Data Resource
The NMPDR provided curated annotations in an environment for comparative analysis of genomes and biological subsystems, with an emphasis on the food-borne pathogens Campylobacter, Listeria, Staphylococcus, Streptococcus, and Vibrio; as well as the STD pathogens Chlamydiaceae, Haemophilus, Mycoplasma, Neisseria, Treponema, and Ureaplasma.

siRecords is a collection of a diverse range of mammalian RNAi experiments . After choosing a gene, researchers can find all siRNA records targeting the gene, design a new siRNA targeting it, or submit siRNAs that have been tested. The resource also helps experimental RNAi researchers by providing them with the efficacy and other information about the siRNAs experiments designed and conducted previously against the genes of their interest.

Human-Transcriptome Database for Alternative Splicing
H-DBAS offers unique data and viewer for human Alternative Splicing (AS) analysis including genome-wide representative alternative splicing variants (RASVs), RASVs affecting protein functions, conserved RASVs compared with mouse genome (full length cDNAs).

Cnidarian Evolutionary Genomics Database
CnidBase, the Cnidarian Evolutionary Genomics Database, is a tool for investigating the evolutionary, developmental and ecological factors that affect gene expression and gene function in cnidarians.

StellaBase is the Nematostella vectensis genomics database.

Tandem Repeats Database
Tandem Repeats Database (TRDB) is a public repository of information on tandem repeats in genomic DNA and contains a variety of tools for their analysis.

Comparative Fungal Genomics Platform
The CFGP (Comparative Fungal Genomics Platform) was designed for comparative genomics projects with diverse fungal genomes.

Magnaporthe grisea Database
The Magnaporthe comparative genomics database provides accesses to multiple fungal genomes from the Magnaporthaceae family to facilitate the comparative analysis. The project is a partnership between the International Rice Blast Genome Consortium, and the Broad Institute. The project is facilitated by an Advisory Board made up of members of the rice blast research community.

The goal of MutDB is to annotate human variation data with protein structural information and other functionally relevant information, if available. The mutations are organized by gene. Click on the alphabet below to go alphabetically through the list of genes.

PhosphoSite Plus
PhosphoSite Plus provides extensive information on mammalian post-translational modifications (PTMs). The resource supersedes PhosphoSite a mammalian protein database that provides information about in vivo phosphorylation sites.

The Chromosome 7 Annotation Project
The objective of this project is to generate the most comprehensive description of human chromosome 7 to facilitate biological discovery, disease gene research and medical genetic applications.

RNAiDB provides access to results from RNAi interference studies in C. elegans , including images, movies, phenotypes, and graphical maps.

Candida Genome Database
The Candida Genome Database (CGD) provides access to genomic sequence data and manually curated functional information about genes and proteins of the human pathogen Candida albicans. It collects gene names and aliases, and assigns gene ontology terms to describe the molecular function, biological process, and subcellular localization of gene products.

The Database of Human DNA Methylation and Cancer
The database of human DNA Methylation and Cancer (MethyCancer) is developed to study interplay of DNA methylation, gene expression and cancer. It hosts both highly integrated data of DNA methylation, cancer-related gene, mutation and cancer information from public resources, and the CpG Island (CGI) clones derived from our large-scale sequencing.

Shanghai Rapeseed Database
Shanghai RAPESEED Database: a resource for functional genomics studies of seed development and fatty acid metabolism of Brassica.

GreenPhylDB: A phylogenomic database for plant comparative genomics
GreenPhylDB comprises 37 full genomes from the major phylum of plant evolution. Clustering of these genomes was performed to define a consistent and extensive set of homeomorphic plant families.

OryGenesDB: an interactive tool for rice reverse genetics
The aim of this Oryza sativa database was first to display sequence information such as the T-DNA and Ds flanking sequence tags (FSTs) produced in the framework of the French genomics initiative Genoplante and the EU consortium Cereal Gene Tags. This information was later linked with related molecular data from external rice molecular resources (cDNA full length, Gene, EST, Markers, Expression data...).

Oryza Tag Line
Oryza Tag Line consists in a searchable database developed under the Oracle management system integrating phenotypic data resulting from the evaluation of the Genoplante rice insertion line library.

TropGENE DB is a database that manages genetic and genomic information about tropical crops studied by Cirad. The database is organised into crop specific modules.

Information system for G protein-coupled receptors (GPCRs)
The GPCRDB is a molecular-class information system that collects, combines, validates and stores large amounts of heterogenous data on G protein-coupled receptors (GPCRs). The GPCRDB contains data on sequences, ligand binding constants and mutations. In addition, many different types of computationally derived data are stored such as multiple sequence alignments and homology models.

PhylomeDB is a public database for complete catalogs of gene phylogenies (phylomes). Researchers are able to use this resource to visualise the history of genes with the available phylogentic trees and multiple sequence alignments.

Plant Genome Network
PGN is a repository for plant EST sequence data located at Cornell. It comprises an analysis pipeline and a website, and presently contains mainly data from the Floral Genome Project.

Sol Genomics Network
The Sol Genomics Network (SGN) is a database and website dedicated to the genomic information of the Solanaceae family, which includes species such as tomato, potato, pepper, petunia and eggplant.

The DrugBank database is a freely available bioinformatics and chemoinformatics resource that combines detailed drug (i.e. chemical, pharmacological and pharmaceutical) data with comprehensive drug target (i.e. sequence, structure, and pathway) information.

European Nucleotide Archive
The European Nucleotide Archive (ENA) is a nucleotide database which is part of an international nucleotide sequence database collaboration. This collaboration comprises ENA itself, the DNA DataBank of Japan (DDBJ), and NCBI GenBank. This resource was formerly called the EMBL nucleotide sequence database. ENA provides a comprehensive record of the world's nucleotide sequencing information, covering raw sequencing data, sequence assembly information and functional annotation.

European Patent Office - Protein and Nucleic Acid Sequences
Sequences extracted from European Patent Office (EPO) patents.

Integrated resource of protein families, domains and functional sites
InterPro is an integrated database of predictive protein "signatures" used for the classification and automatic annotation of proteins and genomes. InterPro classifies sequences at superfamily, family and subfamily levels, predicting the occurrence of functional domains, repeats and important sites.

Ligand-Gated Ion Channel database
The Ligand-Gated Ion Channel database provides nucleic and proteic sequences of the subunits of ligand-gated ion channels. The database can be used to generate multiple sequence alignments from selected subunits, and gives the atomic coordinates of subunits, or portion of subunits, where available.

PDBsum; at-a-glance overview of macromolecular structures
PDBsum provides an overview of every macromolecular structure deposited in the Protein Data Bank (PDB), giving schematic diagrams of the molecules in each structure and of the interactions between them.

The Ensembl project is based at the European Molecular Biology Laboratory's European Bioinformatics Institute (EMBL-EBI). It provides one of the most comprehensive and integrated resources of genomic data, which can be accessed through the web (www.ensembl.org) and through BioMart, FTP, Perl APIs, REST API and MySQL queries.

MycoBrowser leprae
Mycobrowser is a resource that provides both in silico generated and manually reviewed information within databases dedicated to the complete genomes of Mycobacterium tuberculosis, Mycobacterium leprae, Mycobacterium marinum and Mycobacterium smegmatis. This collection references Mycobacteria leprae information.

MycoBrowser marinum
Mycobrowser is a resource that provides both in silico generated and manually reviewed information within databases dedicated to the complete genomes of Mycobacterium tuberculosis, Mycobacterium leprae, Mycobacterium marinum and Mycobacterium smegmatis. This collection references Mycobacteria marinum information.

MycoBrowser smegmatis
Mycobrowser is a resource that provides both in silico generated and manually reviewed information within databases dedicated to the complete genomes of Mycobacterium tuberculosis, Mycobacterium leprae, Mycobacterium marinum and Mycobacterium smegmatis. This collection references Mycobacteria smegmatis information.

MycoBrowser tuberculosis
Mycobrowser is a resource that provides both in silico generated and manually reviewed information within databases dedicated to the complete genomes of Mycobacterium tuberculosis, Mycobacterium leprae, Mycobacterium marinum and Mycobacterium smegmatis. This collection references Mycobacteria tuberculosis information.

Expression Database in 4D
This database provides a platform to query and compare gene expression data during the development of the major model animals (zebrafish, drosophila, medaka, mouse). The high resolution expression data was acquired through whole mount in situ hybridsation-, antibody- or transgenic experiments.

Simple Modular Architecture Research Tool
SMART (Simple Modular Architecture Research Tool) is a web resource providing simple identification and extensive annotation of protein domains and the exploration of protein domain architectures.

STRING: functional protein association networks
STRING is a database of known and predicted protein interactions. The interactions include direct (physical) and indirect (functional) associations.

Cryptosporidum Genomics Resource
CryptoDB is an integrated genomic and functional genomic database for the parasite Cryptosporidium. CryptoDB integrates whole genome sequence and annotation along with experimental data and environmental isolate sequences provided by community researchers, it also includes supplemental bioinformatics analyses and a web interface for data-mining.

Eukaryotic Pathogen Database Resources
EuPathDB is an integrated database covering the eukaryotic pathogens. While each of the taxonomic groups within this resource is supported by a taxon-specific database built upon the same infrastructure, the EuPathDB portal offers an entry point to all of these resources, and the opportunity to leverage orthology for searches across genera.

A detailed study of Giardia lamblia's genome will provide insights into an early evolutionary stage of eukaryotic chromosome organization as well as other aspects of the prokaryotic / eukaryotic divergence.

MicrosporidiaDB is one of the databases that can be accessed through the EuPathDB (http://EuPathDB.org; formerly ApiDB) portal, covering eukaryotic pathogens of the genera Cryptosporidium, Giardia, Leishmania, Neospora, Plasmodium, Toxoplasma, Trichomonas and Trypanosoma. While each of these groups is supported by a taxon-specific database built upon the same infrastructure, the EuPathDB portal offers an entry point to all these resources, and the opportunity to leverage orthology for searches across genera.

PlasmoDB is a genome database for the genus Plasmodium, a set of single-celled eukaryotic pathogens that cause human and animal diseases, including malaria.

Toxoplasma Genomics Resource
ToxoDB is a genome database for the genus Toxoplasma, a set of single-celled eukaryotic pathogens that cause human and animal diseases, including toxoplasmosis.

TrichDB is one of the databases that can be accessed through the EuPathDB (http://EuPathDB.org; formerly ApiDB) portal, covering eukaryotic pathogens of the genera Cryptosporidium, Giardia, Leishmania, Neospora, Plasmodium, Toxoplasma, Trichomonas and Trypanosoma. While each of these groups is supported by a taxon-specific database built upon the same infrastructure, the EuPathDB portal offers an entry point to all these resources, and the opportunity to leverage orthology for searches across genera.

TriTrypDB is one of the databases that can be accessed through the EuPathDB (http://EuPathDB.org; formerly ApiDB) portal, covering eukaryotic pathogens of the genera Cryptosporidium, Giardia, Leishmania, Neospora, Plasmodium, Toxoplasma, Trichomonas and Trypanosoma. While each of these groups is supported by a taxon-specific database built upon the same infrastructure, the EuPathDB portal offers an entry point to all these resources, and the opportunity to leverage orthology for searches across genera.

Human Oral Microbiome Database
The Human Oral Microbiome Database (HOMD) provides a site-specific comprehensive database for the more than 600 prokaryote species that are present in the human oral cavity. It contains genomic information based on a curated 16S rRNA gene-based provisional naming scheme, and taxonomic information.

The Gene Index Project
The goal of The Gene Index Project is to use the available EST and gene sequences, along with the reference genomes wherever available, to provide an inventory of likely genes and their variants and to annotate these with information regarding the functional roles played by these genes and their products.

Protein Classification Benchmark Collection
The Protein Classification Benchmark Collection was created in order to create standard datasets on which the performance of machine learning methods can be compared.

Interrupted coding sequences
ICDS database is a database containing ICDS detected by a similarity-based approach. The definition of each interrupted gene is provided as well as the ICDS genomic localisation with the surrounding sequence.

The aim of PEROXISOME database (PeroxisomeDB) is to gather, organise and integrate curated information on peroxisomal genes, their encoded proteins, their molecular function and metabolic pathway they belong to, and their related disorders.

ImMunoGeneTics Information System
IMGT is a high-quality integrated knowledge resource specialized in the immunoglobulins (IG) or antibodies, T cell receptors (TR), major histocompatibility complex (MHC) of human and other vertebrate species, and in the immunoglobulin superfamily (IgSF), major histocompatibility complex superfamily (MhcSF) and related proteins of the immune system (RPI) of vertebrates and invertebrates.

Annotated regulatory Binding Sites from Orthologous Promoters
ABS: A database of Annotated regulatory Binding Sites from known binding sites identified in promoters of orthologous vertebrate genes.

Database of Protein Disorder
The Database of Protein Disorder (DisProt) is a curated database that provides information about proteins that lack fixed 3D structure in their putatively native states, either in their entirety or in part.

Drosophila Species Genomes
The D. melanogaster and eight other eukaryote model genomes, and gene predictions from several groups. Summaries of essential genome statistics include sizes, genes found and predicted, homology among genomes, phylogenetic trees of species, and comparisons of several gene predictions for sensitivity and specificity in finding new and known genes.

Eukaryotic Genes
euGenes provides a common summary of gene and genomic information from eukaryotic organism databases including gene symbol and full name, chromosome, genetic and molecular map information, Gene Ontology (Function/Location/Process) and gene homology, product information.

Daphnia Water Flea Genome Database
wFleaBase includes data from all species of the genus, yet the primary species are Daphnia pulex and Daphnia magna, because of the broad set of genomic tools that have already been developed for these animals.

Aphid Genomics Database
The Aphid Genome Database's aim is to improve the current pea aphid genome assembly and annotation, and to provide new aphid genome sequences as well as tools for analysis of these genomes.

SpodoBase is an integrated database for the genomics of the Lepidoptera Spodoptera frugiperda. It is a publicly available structured database with insect pest sequences which will allow identification of a number of genes and comprehensive cloning of gene families of interest for the scientific community.

Buruli ulcer bacillus Database
A database dedicated to the analysis of the genome of Mycobacterium ulcerans, the Buruli ulcer bacillus. It provides a complete dataset of DNA and protein sequences derived from the epidemic strain Agy99, linked to the relevant annotations and functional assignments.

Its purpose is to collate and integrate various aspects of the genomic information from E. coli, the paradigm of Gram-negative bacteria. Colibri provides a complete dataset of DNA and protein sequences derived from the paradigm strain E. coli K-12, linked to the relevant annotations and functional assignments. It allows one to easily browse through these data and retrieve information, using various criteria (gene names, location, keywords, etc.).

GenoList Genome Browser
GenoList is an integrated environment for comparative exploration of microbial genomes.

Legionella pneumophila genome database
LegioList is a database dedicated to the analysis of the genomes of Legionella pneumophila strain Paris (endemic in France), strain Lens (epidemic isolate), strain Phildelphia 1, and strain Corby. It also includes the genome of Legionella longbeachae strain NSW150.

Listeria innocua and Listeria monocytogenes genomes database
ListiList is a database dedicated to the analysis of the genomes of the food-borne pathogen, Listeria monocytogenes, and its non-pathogenic relative, Listeria innocua. Its purpose is to collate and integrate various aspects of the genomic information from L. monocytogenes, a paradigm for bacterial-host interactions.

Mycoplasma pulmonis genome database
Its purpose is to collate and integrate various aspects of the genomic information from M. pulmonis, a mollicute causal agent of murine respiratory mycoplasmosis. MypuList provides a complete dataset of DNA and protein sequences derived from the strain M. pulmonis UAB CTIP, linked to the relevant annotations and functional assignments.

Photorhabdus luminescens genome database
PhotoList, contains a database dedicated to the analysis of the genome of Photorhabdus luminescens. This analysis has been described in: "The genome sequence of the entomopathogenic bacterium Photorhabdus luminescens"

Streptococcus agalactiae NEM316 / Serotype III genome database
SagaList contains a database dedicated to the analysis of the genomes of the food-borne pathogen, Streptococcus agalactiae.

Bacillus subtilis strain 168 genome
Its purpose is to collate and integrate various aspects of the genomic information from B. subtilis, the paradigm of sporulating Gram-positive bacteria. SubtiList provides a complete dataset of DNA and protein sequences derived from the paradigm strain B. subtilis 168, linked to the relevant annotations and functional assignments

The PeptideAtlas Project provides a publicly accessible database of peptides identified in tandem mass spectrometry proteomics studies and software tools.

Bactibase: database dedicated to bacteriocins
BACTIBASE contains calculated or predicted physicochemical properties of bacteriocins produced by both Gram-positive and Gram-negative bacteria. The information in this database is very easy to extract and allows rapid prediction of relationships structure/function and target organisms of these peptides and therefore better exploitation of their biological activity in both the medical and food sectors.

Human Protein Reference Database
The Human Protein Reference Database represents a centralized platform to visually depict and integrate information pertaining to domain architecture, post-translational modifications, interaction networks and disease association for each protein in the human proteome.

Fungal Nomenclature and Species Bank
MycoBank is an online database, documenting new mycological names and combinations, eventually combined with descriptions and illustrations.

SoyBase, the USDA-ARS soybean genetic database, is a comprehensive repository for professionally curated genetics, genomics and related data resources for soybean. SoyBase contains the most current genetic, physical and genomic sequence maps integrated with qualitative and quantitative traits. The quantitative trait loci (QTL) represent more than 18 years of QTL mapping of more than 90 unique traits. SoyBase also contains the well-annotated 'Williams 82' genomic sequence and associated data mining tools. The genetic and sequence views of the soybean chromosomes and the extensive data on traits and phenotypes are extensively interlinked. This allows entry to the database using almost any kind of available information, such as genetic map symbols, soybean gene names or phenotypic traits. SoyBase is the repository for controlled vocabularies for soybean growth, development and trait terms, which are also linked to the more general plant ontologies.

European Mouse Mutant Archive
The European Mouse Mutant Archive (EMMA) is a non-profit repository for the collection, archiving (via cryopreservation) and distribution of relevant mutant strains essential for basic biomedical research. The laboratory mouse is the most important mammalian model for studying genetic and multi-factorial diseases in man. Thus the work of EMMA will play a crucial role in exploiting the tremendous potential benefits to human health presented by the current research in mammalian genetics.

Integrated Microbial Genomes
The integrated microbial genomes (IMG) system is a data management, analysis and annotation platform for all publicly available genomes. IMG contains both draft and complete JGI (DoE Joint Genome Institute) microbial genomes integrated with all other publicly available genomes from all three domains of life, together with a large number of plasmids and viruses.

PSIbase is a molecular interaction database based on PSIMAP (PDB, SCOP) that focuses on structural interaction of proteins and their domains

BeetleBase is a community resource for Tribolium genetics, genomics and developmental biology. The database is built on the Chado generic data model, and is able to store various types of data, ranging from genome sequences to mutant phenotypes.

Human Unidentified Gene-Encoded large proteins database
HUGE is a database for human large proteins newly identified in the Kazusa cDNA project, the aim of which is to predict the primary structure of proteins from the sequences of human large cDNAs (>4 kb).

ROdent Unidentified Gene-Encoded large proteins
The ROUGE protein database is a sister database of HUGE protein database which has accumulated the results of comprehensive sequence analysis of human long cDNAs (KIAA cDNAs). The ROUGE protein database has been created to publicize the information obtained from mouse homologues of the KIAA cDNAs (mKIAA cDNAs).

Yeast Searching for Transcriptional Regulators and Consensus Tracking
YEASTRACT (Yeast Search for Transcriptional Regulators And Consensus Tracking) is a curated repository of more than 48333 regulatory associations between transcription factors (TF) and target genes in Saccharomyces cerevisiae, based on more than 1200 bibliographic references.

Protein Model Database
The Protein Model DataBase (PMDB), is a database that collects manually built three dimensional protein models, obtained by different structure prediction techniques.

A 16S rRNA gene database which provides chimera screening, standard alignment, and taxonomic classification using multiple published taxonomies.

Maize Genetics and Genomics Database
MaizeGDB is the maize research community's central repository for genetics and genomics information.

The Global Proteome Machine Database
Rather than being a complete record of a proteomics experiment, this database holds the minimum amount of information necessary for certain bioinformatics-related tasks, such as sequence assignment validation. Most of the data is held in a set of XML files.

The NEW Antirrhinum majus (Snapdragon) genetic and genomic database

Rat Genome Database
The Rat Genome Database is the premier site for genetic, genomic, phenotype, and disease data generated from rat research. It provides easy access to corresponding human and mouse data for cross-species comparison and its comprehensive data and innovative software tools make it a valuable resource for researchers worldwide.

Mouse Genome Database - a Mouse Genome Informatics (MGI) Resource
MGI is the international database resource for the laboratory mouse, providing integrated genetic, genomic, and biological data to facilitate the study of human health and disease. Data includes gene characterization, nomenclature, mapping, gene homologies among mammals, sequence links, phenotypes, allelic variants and mutants, and strain data.

Conserved Domain Database
The Conserved Domain Database (CDD) brings together several collections of multiple sequence alignments representing conserved domains, including NCBI-curated domains, which use 3D-structure information to explicitly to define domain boundaries and provide insights into sequence/structure/function relationships, as well as domain models imported from a number of external source databases (Pfam, SMART, COG, NCBI Protein Clusters, TIGRFAM). NCBI-curated models are organized hierarchically into families, sub- and super-families, and come with annotation of functional sites.

Clusters of Orthologous Groups of Proteins: Phylogenetic classification of proteins encoded in complete genomes
Clusters of Orthologous Groups of proteins (COGs) were delineated by comparing protein sequences encoded in complete genomes, representing major phylogenetic lineages. Each COG consists of individual proteins or groups of paralogs from at least 3 lineages and thus corresponds to an ancient conserved domain.

Expressed Sequence Tags database
The dbEST contains sequence data and other information on "single-pass" cDNA sequences, or "Expressed Sequence Tags", from a number of organisms.

Database of Single Nucleotide Polymorphism
Database of single nucleotide polymorphisms (SNPs) and multiple small-scale variations that include insertions/deletions, microsatellites, and non-polymorphic variants.

Influenza Virus Resource
Influenza Virus Resource presents data obtained from the NIAID Influenza Genome Sequencing Project as well as from GenBank, combined with tools for flu sequence analysis, annotation and submission to GenBank. In addition, it provides links to other resources that contain flu sequences, publications and general information about flu viruses.

Molecular Modeling Database (MMDB)
The Molecular Modeling Database (MMDB), as part of the Entrez system, facilitates access to structure data by connecting them with associated literature, protein and nucleic acid sequences, chemicals, biomolecular interactions, and more.

NCBI Gene provides information for genes from a wide range of species. A record may include nomenclature, Reference Sequences (RefSeqs), maps, pathways, variations, phenotypes, and links to genome-, phenotype-, and locus-specific resources worldwide.

NCBI Viral genomes
This collection of virus genomic sequences is a part of Entrez Genome that provides curated sequence data and related information for the community.

Organelle Genome Resource
The organelle genomes are part of the NCBI Reference Sequence (RefSeq) project that provides curated sequence data and related information for the community to use as a standard.

Plant Genome Central
The list of plant sequencing projects in this page includes those that have reached the stage where active sequence determination is currently producing, or is expected to produce in the near future, GenBank accessions toward the goal of determining the sequence of that plant genome.

ProtClustDB is a collection of related protein sequences (clusters) consisting of Reference Sequence proteins encoded by complete genomes. This database contains both curated and non-curated clusters.

Reference Sequence Database
The Reference Sequence (RefSeq) collection aims to provide a comprehensive, integrated, non-redundant, well-annotated set of sequences, including genomic DNA, transcripts, and proteins.

UniGene gene-oriented nucleotide sequence clusters
Each UniGene entry is a set of transcript sequences that appear to come from the same transcription locus (gene or expressed pseudogene), together with information on protein similarities, gene expression, cDNA clone reagents, and genomic location.

UniSTS was a database of sequence tagged sites (STSs), derived from STS-based maps and other experiments. All data from this resource have been moved to the Probe database. You can retrieve all UniSTS records by searching the probe database using the search term "unists[properties]". Additionally, legacy data remain on the NCBI FTP Site in the UniSTS Repository (ftp://ftp.ncbi.nih.gov/pub/ProbeDB/legacy_unists). If you have any specific questions, please feel free to contact us at info@ncbi.nlm.nih.gov

UniVec is a database that can be used to quickly identify segments within nucleic acid sequences which may be of vector origin (vector contamination). In addition to vector sequences, UniVec also contains sequences for those adapters, linkers, and primers commonly used in the process of cloning cDNA or genomic DNA.

NIA Mouse cDNA Project
A catalog of mouse genes expressed in early embryos, embryonic and adult stem cells was assembled. The cDNA libraries are freely distributed to the research community, providing a standard platform for expression studies using microarrays.

Rice Genome Annotation Project
This website provides genome sequence from the Nipponbare subspecies of rice and annotation of the 12 rice chromosomes. These data are available through search pages and the Genome Browser that provides an integrated display of annotation data.

Drug Adverse Reaction Target
A database for facilitating the search for drug adverse reaction target. DART contains information about known drug adverse reaction targets, functions and properties.

Restriction enzymes and methylases database
A collection of information about restriction enzymes and related proteins. It contains published and unpublished references, recognition and cleavage sites, isoschizomers, commercial availability, methylation sensitivity, crystal, genome, and sequence data.

Transporter Classification Database
The database details a comprehensive IUBMB approved classification system for membrane transport proteins known as the Transporter Classification (TC) system. The TC system is analogous to the Enzyme Commission (EC) system for classification of enzymes, except that it incorporates both functional and phylogenetic information. Descriptions, TC numbers, and examples of over 600 families of transport proteins are provided. Transport systems are classified on the basis of five criteria, and each of these criteria corresponds to one of the five numbers or letters within the TC# for a particular type of transporter.

The Oryzabase is a comprehensive rice science database established in 2000 by rice researcher's committee in Japan. The Oryzabase consists of five parts, (1) genetic resource stock information, (2) gene dictionary, (3) chromosome maps, (4) mutant images, and (5) fundamental knowledge of rice science.

Homologous Vertebrate Genes Database
HOVERGEN is a database of homologous vertebrate genes that allows one to select sets of homologous genes among vertebrate species, and to visualize multiple alignments and phylogenetic trees.

PIRSF; a whole-protein classification database
The PIR SuperFamily concept is being used as a guiding principle to provide comprehensive and non-overlapping clustering of UniProtKB sequences into a hierarchical order to reflect their evolutionary relationships.

ArachnoServer: Spider toxin database
ArachnoServer is a manually curated database containing information on the sequence, three-dimensional structure, and biological activity of protein toxins derived from spider venom.

Description of Plant Viruses
DPVweb provides a central source of information about viruses, viroids and satellites of plants, fungi and protozoa. Comprehensive taxonomic information, including brief descriptions of each family and genus, and classified lists of virus sequences are provided. The database also holds detailed, curated, information for all sequences of viruses, viroids and satellites of plants, fungi and protozoa that are complete or that contain at least one complete gene. The database will not be updated with sequence or taxonomic data from Aug 2013.

TargetTrack, a target registration database, provides information on the experimental progress and status of targets selected for structure determination.

The MEROPS database is an information resource for peptidases (also termed proteases, proteinases and proteolytic enzymes) and the proteins that inhibit them.

Sanger Pfam Mirror
The Pfam database contains information about protein domains and families. For each entry a protein sequence alignment and a Hidden Markov Model is stored.

The Vertebrate Genome Annotation Database
The Vertebrate Genome Annotation (VEGA) database is a central repository for high quality manual annotation of vertebrate finished genome sequence.

3D interacting domains
The database of 3D Interaction Domains (3did) is a collection of domain-domain interactions in proteins for which high-resolution three-dimensional structures are known. 3did exploits structural information to provide critical molecular details necessary for understanding how interactions occur.

Pseudomonas Genome DB
The Pseudomonas Genome Database is a resource for peer-reviewed, continually updated annotation for all Pseudomonas species. It includes gene and protein sequence information, as well as regulation and predicted function and annotation.

Protein ANalysis THrough Evolutionary Relationships: Classification of Genes and Proteins
The PANTHER (Protein ANalysis THrough Evolutionary Relationships) Classification System is a unique resource that classifies genes by their functions, using published scientific experimental evidence and evolutionary relationships to predict function even in the absence of direct experimental evidence.

HAMAP database of microbial protein families
HAMAP is a system, based on manual protein annotation, that identifies and semi-automatically annotates proteins that are part of well-conserved families or subfamilies: the HAMAP families. HAMAP is based on manually created family rules and is applied to bacterial, archaeal and plastid-encoded proteins.

PROSITE; a protein domain and family database
PROSITE consists of documentation entries describing protein domains, families and functional sites as well as associated patterns and profiles to identify them

Type 1 Diabetes Database
T1DBase focuses on two research areas in type 1 diabetes (T1D): the genetics of T1D susceptibility and beta cell biology.

Domain mapping of disease mutations
Domain mapping of disease mutations (DMDM) is a database in which each disease mutation can be displayed by its gene, protein or domain location.

UniProt Knowledgebase
The UniProt Knowledgebase (UniProtKB) is the central hub for the collection of functional information on proteins, with accurate, consistent and rich annotation. In addition to capturing the core data mandatory for each UniProtKB entry (mainly, the amino acid sequence, protein name or description, taxonomic data and citation information), as much annotation information as possible is added. This includes widely accepted biological ontologies, classifications and cross-references, and clear indications of the quality of annotation in the form of evidence attribution of experimental and computational data. The UniProt Knowledgebase consists of two sections: a reviewed section containing manually-annotated records with information extracted from literature and curator-evaluated computational analysis (aka "UniProtKB/Swiss-Prot"), and an unreviewed section with computationally analyzed records that await full manual annotation (aka "UniProtKB/TrEMBL").

A CLAssification of Mobile genetic Elements
ACLAME is a database dedicated to the collection and classification of mobile genetic elements (MGEs) from various sources, comprising all known phage genomes, plasmids and transposons.

CATH Protein Structure Classification
The CATH database is a hierarchical domain classification of protein structures in the Protein Data Bank. Protein structures are classified using a combination of automated and manual procedures. There are four major levels in this hierarchy; Class (secondary structure classification, e.g. mostly alpha), Architecture (classification based on overall shape), Topology (fold family) and Homologous superfamily (protein domains which are thought to share a common ancestor). This collection is concerned with superfamily classification.

The Human Metabolome Database
The Human Metabolome Database (HMDB) is a database containing detailed information about small molecule metabolites found in the human body.It contains or links 1) chemical 2) clinical and 3) molecular biology/biochemistry data.

Toxin and Toxin Target Database
Toxin and Toxin Target Database (T3DB) is a bioinformatics resource that combines detailed toxin data with comprehensive toxin target information.

GpDB is a publicly accessible, relational database of G-proteins and their interactions with GPCRs and effector molecules. The sequences are classified according to a hierarchy of different classes, families and sub-families, based on extensive literature search.

Xenopus laevis and tropicalis biology and genomics resource
Xenbase is the model organism database for Xenopus laevis and X. (Silurana) tropicalis. It contains genomic, development data and community information for Xenopus research. It includes gene expression patterns that incorporate image data from the literature, large scale screens and community submissions.

Database of Interacting Proteins
The database of interacting protein (DIP) database stores experimentally determined interactions between proteins. It combines information from a variety of sources to create a single, consistent set of protein-protein interactions

VBASE2 is an integrative database of germ-line variable genes from the immunoglobulin loci of human and mouse. All variable gene sequences are extracted from the EMBL-Bank.

miRBase Sequence Database
The miRBase Sequence Database is a searchable database of published miRNA sequences and annotation. The data were previously provided by the miRNA Registry. The miRBase Registry continues to provide gene hunters with unique names for novel miRNA genes prior to publication of results. The miRBase Targets database is a new resource of predicted miRNA targets in animals.

Molecular database for the identification of fungi
UNITE is primarily a fungal rDNA internal transcribed spacer (ITS) sequence database, although they also welcome additional genes and genetic markers. UNITE focuses on high-quality ITS sequences generated from fruiting bodies collected and identified by experts and deposited in public herbaria.

Bacterial Protein Interaction Database
Bacteriome.org is a database integrating physical (protein-protein) and functional interactions within the context of an E. coli knowledgebase.

Poxvirus Bioinformatics Resource Center
Poxvirus Bioinformatics Resource Center has been established to provide specialized web-based resources to the scientific community studying poxviruses. This resource is no longer being maintained. For tools and data supporting virus genomics, especially related to poxviruses and other large DNA viruses, please visit the Viral Bioinformatics site maintained by our collaborator, Chris Upton: http://virology.ca For information on virus taxonomy, please visit the ICTV web site at http://www.ictvonline.org/ For updated sequence data and analytical tools, please visit http://www.viprbrc.org

VectorBase is a web-accessible data repository for information about invertebrate vectors of human pathogens. VectorBase annotates and maintains vector genomes providing an integrated resource for the research community. Currently, VectorBase contains genome information for 38 organisms including Anopheles gambiae, a vector for the Plasmodium protozoan agent causing malaria, and Aedes aegypti, a vector for the flaviviral agents causing Yellow fever and Dengue fever. Recent additions include large scale variant (SNP) datasets and population genetics data (genotype/phenotype).

Genome Database for Rosaceae
The Genome Database for Rosaceae (GDR) is a curated and integrated web-based relational database providing centralized access to Rosaceae genomics and genetics data and analysis tools to facilitate cross-species utilization of data.

Yeast Resource Center Public Data Repository
The National Center for Research Resources' Yeast Resource Center is located at the University of Washington in Seattle, Washington. The mission of the center is to facilitate the identification and characterization of protein complexes in the yeast Saccharomyces cerevisiae.

EBI Metagenomics
"EBI Metagenomics" is a free-to-use resource aiming at supporting all metagenomics researchers. The service is an automated pipeline for the analysis and archiving of metagenomic data that aims to provide insights into the phylogenetic diversity as well as the functional and metabolic potential of a sample. You can freely browse all the public data in the repository.

CentrosomeDB is a collection of human and drosophila centrosomal genes that were reported in the literature and other sources. The database offers the possibility to study the evolution, function, and structure of the centrosome. They have compiled information from many sources, including Gene Ontology, disease-association, single nucleotide polymorphisms, and associated gene expression experiments.

The MOuse NOnCode Lung database
MONOCLdb is an integrative and interactive database designed to retrieve and visualize annotations and expression profiles of long-non coding RNAs (lncRNAs) expressed in Collaborative Cross (http://compgen.unc.edu/) founder mice in response to respiratory influenza and SARS infections.

FlyMine is an integrated database of genomic, expression and protein data for Drosophila, Anopheles and C. elegans. Integrating data makes it possible to run sophisticated data mining queries that span domains of biological knowledge.

A database of protein disorder and mobility annotations. MobiDB was designed to offer a centralized resource for annotations of intrinsic protein disorder. The database features three levels of annotation: manually curated, indirect and predicted. Manually curated data is extracted from the DisProt database. Indirect data is inferred from PDB structures that are considered an indication of intrinsic disorder. The 10 predictors currently included (three ESpritz flavors, two IUPred flavors, two DisEMBL flavors, GlobPlot, VSL2b and JRONN) enable MobiDB to provide disorder annotations for every protein in absence of more reliable data. The new version also features a consensus annotation and classification for long disordered regions. In order to complement the disorder annotations, MobiDB features additional annotations from external sources. Annotations from the UniProt database include post-translational modifications and linear motifs. Pfam annotations are displayed in graphical form and are link-enabled, allowing the user to visit the corresponding Pfam page for further information. Experimental protein–protein interactions from STRING are also classified for disorder content.

RepeatsDB (http://repeatsdb.bio.unipd.it/) is a database of annotated tandem repeat protein structures. Tandem repeats pose a difficult problem for the analysis of protein structures, as the underlying sequence can be highly degenerate. Several repeat types haven been studied over the years, but their annotation was done in a case-by-case basis, thus making large-scale analysis difficult. We developed RepeatsDB to fill this gap. Using state-of-the-art repeat detection methods and manual curation, we systematically annotated the Protein Data Bank, predicting 10 745 repeat structures. In all, 2797 structures were classified according to a recently proposed classification schema, which was expanded to accommodate new findings. In addition, detailed annotations were performed in a subset of 321 proteins. These annotations feature information on start and end positions for the repeat regions and units. RepeatsDB is an ongoing effort to systematically classify and annotate structural protein repeats in a consistent way. It provides users with the possibility to access and download high-quality datasets either interactively or programmatically through web services.

Eukaryotic Promoter Database
The Eukaryotic Promoter Database is an annotated non-redundant collection of eukaryotic POL II promoters, for which the transcription start site has been determined experimentally.

Alternative Poly(A) Sites database
APASdb can visualize the precise map and usage quantification of different APA isoforms for all genes. The datasets are deeply profiled by the sequencing alternative polyadenylation sites (SAPAS) method capable of high-throughput sequencing 3'-ends of polyadenylated transcripts. Thus, APASdb details all the heterogeneous cleavage sites downstream of poly(A) signals, and maintains near complete coverage for APA sites. Furthermore, APASdb provides the quantification of a given APA variant among transcripts with different APA sites by computing their corresponding normalized-reads. In addition, APASdb supports URL-based retrieval, browsing and display of exon-intron structure, poly(A) signals, poly(A) sites location and usage reads, and 3'-untranslated regions (3'-UTRs). Currently, APASdb involves APA in various biological processes and diseases in human, mouse and zebrafish.

GrainGenes, a Database for Triticeae and Avena
The GrainGenes website hosts a wealth of information for researchers working on Triticeae species, oat and their wild relatives. The website hosts a database encompassing information such as genetic maps, genes, alleles, genetic markers, phenotypic data, quantitative trait loci studies, experimental protocols and publications. The database can be queried by text searches, browsing, Boolean queries, MySQL commands, or by using pre-made queries created by the curators. GrainGenes is not solely a database, but serves as an informative site for researchers and a means to communicate project aims, outcomes and a forum for discussion.

Proteome-pI : proteome isoelectric point database
Proteome-pI is an online database containing information about predicted isoelectric points for 5,029 proteomes (21 million of sequences) calculated using 18 methods. The isoelectric point, the pH at which a particular molecule carries no net electrical charge, is an important parameter for many analytical biochemistry and proteomics techniques, especially for 2D gel electrophoresis (2D-PAGE), capillary isoelectric focusing, liquid chromatography–mass spectrometry and X-ray protein crystallography. The database allows the retrieval of virtual 2D-PAGE plots and the development of customized fractions of proteome based on isoelectric point and molecular weight. Moreover, Proteome-pI facilitates statistical comparisons of the various prediction methods as well as biological investigation of protein isoelectric point space in all kingdoms of life (http://isoelectricpointdb.org/statistics.html). The database includes various statistics and tools for interactive browsing, searching and sorting. It can be searched and browsed by organism name, average isoelectric point, molecular weight or amino acid frequencies. Proteins with extreme pI values are also available. For individual proteomes, users can retrieve proteins of interest given the method, isoelectric point and molecular weight ranges (this particular feature can be highly useful to limit potential targets in analysis of 2DPAGE gels or before conducting mass spectrometry). Finally, some general statistics (total number of proteins, amino acids, average sequence length, amino acid and di-amino acid frequencies) and datasets corresponding to major protein databases such as UniProtKB/TrEMBL and the NCBI non-redundant (nr) database have also been precalculated.

Banana Genome Hub
The Banana Genome Hub centralises databases of genetic and genomic data for the Musa acuminata crop, and is the official portal for the Musa genome resources.

BCCM/LMBP Plasmid Collection
The BCCM/LMBP Plasmid Collection warrants the long-term storage and distribution of plasmids, microbial host strains and DNA libraries of fundamental, biotechnological, educational or general scientific importance. The focus is on the collection of recombinant plasmids that can replicate in a microbial host strain. BCCM/LMBP also accepts natural and genetically modified animal or human cell lines, including hybridomas, as well as other genetic material, in the safe deposit and patent deposit collections.

LiceBase is a database for sea lice genomics. LiceBase provides the genome annotation of the Atlantic salmon louse Lepeophtheirus salmonis, a genome browser, Blast functionality and access to related high-thoughput genomics data.

The Triticeae Toolbox
The Triticeae Toolbox (T3) is a repository for public wheat data generated by the Wheat Coordinated Agricultural Project (Wheat CAP). Funding is provided by the National Institute for Food and Agriculture (NIFA) and the United States Department of Agriculture (USDA). The current project is funded through NIFA's International Wheat Yield Partnership (IWYP) and part of the Agriculture and Food Research Initiative (AFRI).

Visual Database for Organelle Genome
VDOG, Visual Database for Organelle Genome is an innovative database of the genome information in the organelles. Most of the data in VDOG are originally extracted from GeneBank, re-organized and represented.

Scroll for more...

Implementing Policies

This record is not implemented by any policy.


Record Maintainer



Rapid and sensitive protein similarity searches.

Lipman DJ,Pearson WR
Science 1985

View Paper (PubMed) View Paper (DOI)