How to cite this record FAIRsharing.org: NCBITAXON; NCBI Taxonomy; DOI: 10.25504/FAIRsharing.fj07xj; Last edited: July 16, 2018, 10:57 p.m.; Last accessed: Jul 20 2018 7:24 a.m.
Record updated: July 16, 2018, 10:57 p.m. by The FAIRsharing Team.
Edits to 'https://fairsharing.org/FAIRsharing.fj07xj' by 'The FAIRsharing Team' at 17:38, 19 Jul 2018 (approved): 'obo_abbreviation' has been modified: Before: ncbitaxon After: ncbitaxon
Edits to 'https://fairsharing.org/FAIRsharing.fj07xj' by 'schoch2' at 17:38, 19 Jul 2018 (approved): 'homepage' has been modified: Before: http://www.ncbi.nlm.nih.gov/Taxonomy/taxonomyhome.html/ After: https://www.ncbi.nlm.nih.gov/taxonomy
Edits to 'https://fairsharing.org/FAIRsharing.fj07xj' by 'The FAIRsharing Team' at 17:38, 19 Jul 2018 (approved): 'domains' has been modified: Before: Life Science|domain|http://www.fairsharing.org/ontology/SRAO_0000069 Taxonomic classification|process|http://edamontology.org/data_1872 After: Life Science|domain|http://www.fairsharing.org/ontology/SRAO_0000069 Taxonomic classification|process|http://edamontology.org/data_1872 DNA sequence data|process|http://edamontology.org/data_3494 Added: DNA sequence data|process|http://edamontology.org/data_3494 Removed:
|online documentation||http://purl.bioontology.org/ontology/NCB ...|
No XSD schemas defined
Conditions of Use
Models and Formats
No identifier schema standards defined
GenBank is the NIH genetic sequence database, an annotated collection of all publicly available DNA sequences. GenBank is part of the International Nucleotide Sequence Database Collaboration, which comprises the DNA DataBank of Japan (DDBJ), the European Molecular Biology Laboratory (EMBL), and GenBank at NCBI. These three organizations exchange data on a daily basis. The complete release notes for the current version of GenBank are available on the NCBI ftp site. A new release is made every two months. GenBank growth statistics for both the traditional GenBank divisions and the WGS division are available from each release.
DNA Data Bank of Japan
Annotated collection of all publicly available nucleotide and protein sequences. In Japan, DDBJ Center internationally contributes as a member of INSDC to collect and to provide nucleotide sequence data with ENA/EBI in Europe and NCBI in USA. DDBJ collects sequence data mainly from Japanese researchers, as well as researchers in any other countries. Ninety-nine percent of INSD data from Japanese researchers are submitted through DDBJ.
Eukaryotic Linear Motifs
This computational biology resource mainly focuses on annotation and detection of eukaryotic linear motifs (ELMs) by providing both a repository of annotated motif data and an exploratory tool for motif prediction. ELMs, or short linear motifs (SLiMs), are compact protein interaction sites composed of short stretches of adjacent amino acids.
Pfam Protein Families
The Pfam database is a large collection of protein families, each represented by multiple sequence alignments and hidden Markov models (HMMs). Proteins are generally composed of one or more functional regions, commonly termed domains. Different combinations of domains give rise to the diverse range of proteins found in nature. The identification of domains that occur within proteins can therefore provide insights into their function. Pfam also generates higher-level groupings of related entries, known as clans. A clan is a collection of Pfam entries which are related by similarity of sequence, structure or profile-HMM.
This resource is a hierarchical clustering of UniProt protein sequences into hierarchical trees. This resource allows for the study of sub-family and super-family of a protein, using UniRef50 clusters.
Comparative Toxicogenomics Database
The Comparative Toxicogenomics Database (CTD) advances understanding of the effects of environmental chemicals on human health. Biocurators manually curate chemical-gene, chemical-disease, and gene-disease relationships from the scientific literature. This core data is then internally integrated to generate inferred chemical-gene-disease networks. Additionally, the core data is integrated with external data sets (such as Gene Ontology and pathway annotations) to predict many novel associations between different data types. A unique and powerful feature of CTD is the inferred relationships generated by data integration that helps turn knowledge into discoveries by identifying novel connections between chemicals, genes, diseases, pathways, and GO annotations that might not otherwise be apparent using other biological resources.
ArchDB is a compilation of structural classifications of loops extracted from known protein structures. The structural classification is based on the geometry and conformation of the loop. The geometry is defined by four internal variables and the type of regular flanking secondary structures, resulting in 10 different loop types. Loops in ArchDB have been classified using an improved version (Espadaler et al.) of the original ArchType program published in 1997 by Oliva et al.
VirHostNet 2.0 integrates an extensive and original literature-curated dataset of virus/virus and virus/host protein-protein interactions complemented with publicly available data.
A CLAssification of Mobile genetic Elements
ACLAME is a database dedicated to the collection and classification of mobile genetic elements (MGEs) from various sources, comprising all known phage genomes, plasmids and transposons.
Giga Science Database
GigaDB primarily serves as a repository to host data and tools associated with articles in GigaScience; however, it also includes a subset of datasets that are not associated with GigaScience articles. GigaDB defines a dataset as a group of files (e.g., sequencing data, analyses, imaging files, software programs) that are related to and support an article or study.
UniCarbKB is an initiative that aims to promote the creation of an online information storage and search platform for glycomics and glycobiology research. The knowledgebase will offer a freely accessible and information-rich resource supported by querying interfaces, annotation technologies and the adoption of common standards to integrate structural, experimental and functional data.
probeBase is a manually maintained and curated database of rRNA-targeted oligonucleotide probes and primers. Contextual information and multiple options for evaluating in silico hybridization performance against the most recent rRNA sequence databases are provided for each oligonucleotide entry, which makes probeBase an important and frequently used resource for microbiology research and diagnostics. The major features of probeBase include a classification of probes and primers according to the NCBI taxonomy database, a powerful and customizable search function, which serves to query for target organisms, probe names, primers, target sites, and references. The probeBase match tool can be used to match near-full length rRNA sequences against probeBase and find all published probes targeting the query sequences. The new proxy match tool extends this analysis to partial rRNA sequences, which exploits full-length sequences in the rRNA sequence database SILVA to find published probes potentially targeting partial query sequences. A tool for submitting new or missing probe sequences or references helps to keep probeBase up-to-date.
The FAIRDOMHub is a publicly available resource build using the SEEK software, which enables collaborations within the scientific community. FAIRDOM will establish a support and service network for European Systems Biology. It will serve projects in standardizing, managing and disseminating data and models in a FAIR manner: Findable, Accessible, Interoperable and Reusable. FAIRDOM is an initiative to develop a community, and establish an internationally sustained Data and Model Management service to the European Systems Biology community. FAIRDOM is a joint action of ERA-Net EraSysAPP and European Research Infrastructure ISBE.
The ENCODE (Encyclopedia of DNA Elements) Consortium is an international collaboration of research groups funded by the National Human Genome Research Institute (NHGRI). The goal of ENCODE is to build a comprehensive parts list of functional elements in the human genome, including elements that act at the protein and RNA levels, and regulatory elements that control cells and circumstances in which a gene is active. ENCODE results from 2007 and later are available from this project. This covers data generated during the two production phases 2007-2012 and 2013-present.
Microenvironment Perturbagen LINCS Center image server
The MEP LINCS project contributes to the development of the NIH Library of Integrated Network-based Cellular Signatures (LINCS) program by developing a dataset and computational strategy to elucidate how microenvironment (ME) signals affect cell intrinsic intracellular transcriptional- and protein-defined molecular networks to generate experimentally observable cellular phenotypes measured by high-content imaging.
GrainGenes, a Database for Triticeae and Avena
The GrainGenes website hosts a wealth of information for researchers working on Triticeae species, oat and their wild relatives. The website hosts a database encompassing information such as genetic maps, genes, alleles, genetic markers, phenotypic data, quantitative trait loci studies, experimental protocols and publications. The database can be queried by text searches, browsing, Boolean queries, MySQL commands, or by using pre-made queries created by the curators. GrainGenes is not solely a database, but serves as an informative site for researchers and a means to communicate project aims, outcomes and a forum for discussion.
MorphoBank is a web application providing an online database and workspace for evolutionary research in systematics (the science of determining the evolutionary relationships among species). MorphoBank invites scientists producing peer-reviewed research to upload images and affiliate data with those images (labels, species names, etc.). MorphoBank also offers a platform for live collaboration on phylgoenetic matrices by teams in a private workspace where they can also affiliate images with phylogenetic matrices. MorphoBank stores digital versions of both text and image-based observations on phenotypes. Phylogenetic matrices (Nexus or TNT format), particularly phenotypical matrices, 2D (including JPEG, GIF, PNG, TIFF and Photoshop) and 3D (PLY, STL, ZIP, TIFF and DCM) image data and video (MPEG-4, QuickTime and WindowsMedia). MorphoBank also offer a Documents folder for additional files about their research such as pdfs, word documents, and text files (e.g., morphometric data, phylogenetic trees).
The Ensembl genome annotation system, developed jointly by the EBI and the Wellcome Trust Sanger Institute, has been used for the annotation, analysis and display of vertebrate genomes since 2000. Since 2009, the Ensembl site has been complemented by the creation of five new sites, for bacteria, protists, fungi, plants and invertebrate metazoa, enabling users to use a single collection of (interactive and programatic) interfaces for accessing and comparing genome-scale data from species of scientific interest from across the taxonomy.
Over 30,000 genome sequences from bacteria and archaea have been annotated and deposited in the public archives of the members of the International Nucleotide Sequence Database Collaboration. This site provides access to complete, annotated genomes from bacteria and archaea (present in the European Nucleotide Archive) through the Ensembl graphical user interface (genome browser).
From release 27 release onwards, all protist genomes whose sequence and annotation has been completed and submitted to the the International Nucleotide Sequence Database Collaboration (i.e. the ENA, GenBank and DDBJ databases) are now available in Ensembl Protists. The release now consists of a total of over 150 genomes, of which over 100 have been taken directly from the INSDC archives and the remainder taken from other sources. The new genomes have been functionally annotated with InterPro entries and GO terms using InterPro v53.
A new genome assembly of Triticum aestivum cv. Chinese Spring is now available in Ensembl Plants. The assembly (TGACv1) and it's accompanying annotation was produced by the Earlham Institute, formerly The Centre for Genome Analysis (TGAC), as part of the Triticeae Genomics for Sustainable Agriculture project.
From release 28 forward, all fungal genomes whose sequence and annotation has been completed and submitted to the the International Nucleotide Sequence Database Collaboration (i.e. the ENA, GenBank and DDBJ databases) is available in Ensembl Fungi. The release now consists of a total of 589 genomes, of which 536 have been taken from the archives and 53 taken directly from other sources.
This site provides access to complete, annotated genomes from metazoa through the Ensembl graphical user interface (genome browser).
Hardwood Genomics Project
The Hardwood Genomics Project is a databases for expressed genes, genetic markers, genetic linkage maps, and reference populations. It provides lasting genomic and biological resources for the discovery and conservation of genes in hardwood trees for growth, adaptation and responses to environmental stresses such as drought, heat, insect pests and disease. All original sequence data is being deposited in NCBI's Sequence Read Archive and the genetic linkage maps and associated marker data will be available at the Dendrome database.
Visual Database for Organelle Genome
VDOG, Visual Database for Organelle Genome is an innovative database of the genome information in the organelles. Most of the data in VDOG are originally extracted from GeneBank, re-organized and represented.
Project Tycho: Data for Health
In 2013, we released the first version of Project Tycho containing weekly case counts for 50 notifiable conditions reported by health agencies in the United States for 50 states and 1284 cities between 1888 and 2014. Over the past four years, over 3700 users have registered to use Project Tycho data for a total of 40 creative works including peer-reviewed research papers, visualizations, online applications, and newspaper articles. Project Tycho 2.0 has expanded its scope to a global level and improved standardization, following FAIR (Findable, Accessible, Interoperable, and Reusable) Data Principles where possible. Project Tycho 2.0 includes case counts for 28 additional notifiable conditions for the US and includes data for dengue-related conditions for 99 countries between 1955 and 2010, obtained from the World Health Organization and Ministries of Health. Project Tycho 2.0 datasets are represented in a standard format and include standard SNOMED-CT codes for reported conditions, ISO 3166 codes for countries and first administrative level subdivisions, and NCBI TaxonID numbers for pathogens. Metadata for Project Tycho datasets are available on the website in human-readable format, but also in machine-interpretable DATS and DataCite metadata files.
A resource providing data on bioentities and their associated ontology terms for Plant Biology. The database provides access to ontology-based annotations of genes, phenotypes and germplasms from about 90 plant species. A number of internal and external ontologies are used to annotate the biological data available from this resource.
The Open Biological and Biomedical Ontology (OBO) Foundry is a collective of ontology developers that are committed to collaboration and adherence to shared principles. The mission of the OBO Foundry is to develop a family of interoperable ontologies that are both logically well-formed and scientifically accurate. To achieve this, OBO Foundry participants voluntarily adhere to and contribute to the development of an evolving set of principles including open use, collaborative development, non-overlapping and strictly-scoped content, and common syntax and relations, based on ontology models that work well, such as the Gene Ontology (GO). The OBO Foundry is overseen by an Operations Committee with Editorial, Technical and Outreach working groups.
Scroll for more...
This record is maintained by schoch2