How to cite this record FAIRsharing.org: NCBITAXON; NCBI Taxonomy; DOI: https://doi.org/10.25504/FAIRsharing.fj07xj; Last edited: Jan. 29, 2020, 3:14 p.m.; Last accessed: Aug 14 2020 1:18 a.m.
Publication for citation The NCBI Taxonomy database Federhen S; Nucleic Acids Res ; 2012; 10.1093/nar/gkr1178;
Record updated: Jan. 29, 2020, 3:13 p.m. by The FAIRsharing Team.
Edits to 'https://fairsharing.org/FAIRsharing.fj07xj' by 'The FAIRsharing Team' at 15:13, 29 Jan 2020 (approved): 'supportLinks' has been modified: Before: help|http://www.ncbi.nlm.nih.gov/Taxonomy/taxonomyhome.html/index.cgi?chapter=howlink help|http://www.ncbi.nlm.nih.gov/books/NBK53758/ help|http://www.ncbi.nlm.nih.gov/books/NBK21100/ online documentation|http://purl.bioontology.org/ontology/NCBITaxon online documentation|https://github.com/obophenotype/ncbitaxon After: help|http://www.ncbi.nlm.nih.gov/Taxonomy/taxonomyhome.html/index.cgi?chapter=howlink help|http://www.ncbi.nlm.nih.gov/books/NBK53758/ help|http://www.ncbi.nlm.nih.gov/books/NBK21100/ online documentation|https://github.com/obophenotype/ncbitaxon Added: Removed: online documentation|http://purl.bioontology.org/ontology/NCBITaxon 'dataProcesses' has been modified: Before: FTP download Name/ID Status After: FTP download Name/ID Status AgroPortal: Browse / Download Added: AgroPortal: Browse / Download Removed:
Edits to 'https://fairsharing.org/FAIRsharing.fj07xj' by 'The FAIRsharing Team' at 19:54, 04 May 2019 (approved): 'onto_disciplines' has been modified: Before: Life Sciences After: Life Sciences Ontology and Terminology Phylogenetics Taxonomy Added: Ontology and Terminology Phylogenetics Taxonomy Removed: 'dataProcesses' has been modified: Before: FTP download After: FTP download Name/ID Status Added: Name/ID Status Removed:
Edits to 'https://fairsharing.org/FAIRsharing.fj07xj' by 'The FAIRsharing Team' at 12:33, 11 Mar 2019 (approved): 'licences' has been modified: Before: After: Creative Commons CC0 1.0 Universal (CC0 1.0) Public Domain Dedication|https://creativecommons.org/publicdomain/zero/1.0/|Data Added: Creative Commons CC0 1.0 Universal (CC0 1.0) Public Domain Dedication|https://creativecommons.org/publicdomain/zero/1.0/|Data Removed: 'supportLinks' has been modified: Before: help|http://www.ncbi.nlm.nih.gov/Taxonomy/taxonomyhome.html/index.cgi?chapter=howlink help|http://www.ncbi.nlm.nih.gov/books/NBK53758/ help|http://www.ncbi.nlm.nih.gov/books/NBK21100/ online documentation|http://purl.bioontology.org/ontology/NCBITaxon After: help|http://www.ncbi.nlm.nih.gov/Taxonomy/taxonomyhome.html/index.cgi?chapter=howlink help|http://www.ncbi.nlm.nih.gov/books/NBK53758/ help|http://www.ncbi.nlm.nih.gov/books/NBK21100/ online documentation|http://purl.bioontology.org/ontology/NCBITaxon online documentation|https://github.com/obophenotype/ncbitaxon Added: online documentation|https://github.com/obophenotype/ncbitaxon Removed:
Edits to 'https://fairsharing.org/FAIRsharing.fj07xj' by 'The FAIRsharing Team' at 12:03, 27 Jul 2018 (approved): 'yearOfCreation' has been modified: Before: None After: 2002
Edits to 'https://fairsharing.org/FAIRsharing.fj07xj' by 'The FAIRsharing Team' at 12:00, 27 Jul 2018 (approved): 'domains' has been modified: Before: DNA sequence data Life Science Taxonomic classification After: Life Science Taxonomic classification Added: Removed: DNA sequence data
Edits to 'https://fairsharing.org/FAIRsharing.fj07xj' by 'The FAIRsharing Team' at 22:44, 15 Jul 2018 (approved): 'domains' has been modified: Before: Life Science|domain|http://www.fairsharing.org/ontology/SRAO_0000069 Taxonomic classification|process|http://edamontology.org/data_1872 After: Life Science|domain|http://www.fairsharing.org/ontology/SRAO_0000069 Taxonomic classification|process|http://edamontology.org/data_1872 DNA sequence data|process|http://edamontology.org/data_3494 Added: DNA sequence data|process|http://edamontology.org/data_3494 Removed:
Edits to 'https://fairsharing.org/FAIRsharing.fj07xj' by 'schoch2' at 20:44, 11 Jan 2018 (approved): 'homepage' has been modified: Before: http://www.ncbi.nlm.nih.gov/Taxonomy/taxonomyhome.html/ After: https://www.ncbi.nlm.nih.gov/taxonomy
Edits to 'https://fairsharing.org/FAIRsharing.fj07xj' by 'The FAIRsharing Team' at 19:26, 09 May 2017 (approved): 'obo_abbreviation' has been modified: Before: ncbitaxon After: ncbitaxon
|help||Linking to the NCBI Taxonomy Database|
|help||Entrez Taxonomy Quick Start|
|help||NCBI Handbook: The Taxonomy Project|
|online documentation||GitHub Repository (for NCBITaxon Ontolog ...|
No XSD schemas defined
Conditions of UseApplies to: Data use
|AgroPortal: Browse / Download||http://agroportal.lirmm.fr/ontologies/NCBITAXON|
Models and Formats
No identifier schema standards defined
No metrics standards defined
GenBank is the NIH genetic sequence database, an annotated collection of all publicly available DNA sequences. The complete release notes for the current version of GenBank are available on the NCBI ftp site. A new release is made every two months. GenBank growth statistics for both the traditional GenBank divisions and the WGS division are available from each release. GenBank is part of the International Nucleotide Sequence Database Collaboration (INSDC), which comprises the DNA DataBank of Japan (DDBJ), the European Molecular Biology Laboratory (EMBL), and GenBank at the NCBI. These three organizations exchange data on a daily basis.
DNA Data Bank of Japan
Annotated collection of all publicly available nucleotide and protein sequences. DDBJ collects sequence data mainly from Japanese researchers, as well as researchers in any other countries. DDBJ is part of the International Nucleotide Sequence Database Collaboration (INSDC), which comprises the DNA DataBank of Japan (DDBJ), the European Molecular Biology Laboratory (EMBL), and GenBank at the NCBI. These three organizations exchange data on a daily basis.
Eukaryotic Linear Motifs
This computational biology resource mainly focuses on annotation and detection of eukaryotic linear motifs (ELMs) by providing both a repository of annotated motif data and an exploratory tool for motif prediction. ELMs, or short linear motifs (SLiMs), are compact protein interaction sites composed of short stretches of adjacent amino acids.
The Pfam database is a large collection of protein families, each represented by multiple sequence alignments and hidden Markov models (HMMs). Pfam also generates higher-level groupings of related entries, known as clans. A clan is a collection of Pfam entries which are related by similarity of sequence, structure or profile-HMM.
This resource is a hierarchical clustering of UniProt protein sequences into hierarchical trees. This resource allows for the study of sub-family and super-family of a protein, using UniRef50 clusters.
Comparative Toxicogenomics Database
The Comparative Toxicogenomics Database (CTD) advances understanding of the effects of environmental chemicals on human health. Biocurators manually curate chemical-gene, chemical-disease, and gene-disease relationships from the scientific literature. This core data is then internally integrated to generate inferred chemical-gene-disease networks. Additionally, the core data is integrated with external data sets (such as Gene Ontology and pathway annotations) to predict many novel associations between different data types. A unique and powerful feature of CTD is the inferred relationships generated by data integration that helps turn knowledge into discoveries by identifying novel connections between chemicals, genes, diseases, pathways, and GO annotations that might not otherwise be apparent using other biological resources.
European Nucleotide Archive
The European Nucleotide Archive (ENA) is a globally comprehensive data resource for nucleotide sequence, spanning raw data, alignments and assemblies, functional and taxonomic annotation and rich contextual data relating to sequenced samples and experimental design. Serving both as the database of record for the output of the world's sequencing activity and as a platform for the management, sharing and publication of sequence data, the ENA provides a portfolio of services for submission, data management, search and retrieval across web and programmatic interfaces. The ENA is part of the International Nucleotide Sequence Database Collaboration (INSDC), which comprises the DNA DataBank of Japan (DDBJ), the European Molecular Biology Laboratory (EMBL), and GenBank at the NCBI. These three organizations exchange data on a daily basis.
ArchDB is a compilation of structural classifications of loops extracted from known protein structures. The structural classification is based on the geometry and conformation of the loop. The geometry is defined by four internal variables and the type of regular flanking secondary structures, resulting in 10 different loop types. Loops in ArchDB have been classified using an improved version (Espadaler et al.) of the original ArchType program published in 1997 by Oliva et al.
VirHostNet 2.0 integrates an extensive and original literature-curated dataset of virus/virus and virus/host protein-protein interactions complemented with publicly available data.
The UniProt Knowledgebase (UniProtKB) is the central hub for the collection of functional information on proteins, with accurate, consistent and rich annotation. In addition to capturing the core data mandatory for each UniProtKB entry (mainly, the amino acid sequence, protein name or description, taxonomic data and citation information), as much annotation information as possible is added. This includes widely accepted biological ontologies, classifications and cross-references, and clear indications of the quality of annotation in the form of evidence attribution of experimental and computational data. The UniProt Knowledgebase consists of two sections: a reviewed section containing manually-annotated records with information extracted from literature and curator-evaluated computational analysis (aka "UniProtKB/Swiss-Prot"), and an unreviewed section with computationally analyzed records that await full manual annotation (aka "UniProtKB/TrEMBL").
A CLAssification of Mobile genetic Elements
ACLAME is a database dedicated to the collection and classification of mobile genetic elements (MGEs) from various sources, comprising all known phage genomes, plasmids and transposons.
Giga Science Database
GigaDB primarily serves as a repository to host data and tools associated with articles in GigaScience; however, it also includes a subset of datasets that are not associated with GigaScience articles. GigaDB defines a dataset as a group of files (e.g., sequencing data, analyses, imaging files, software programs) that are related to and support an article or study.
UniCarbKB is an initiative that aims to promote the creation of an online information storage and search platform for glycomics and glycobiology research. The knowledgebase will offer a freely accessible and information-rich resource supported by querying interfaces, annotation technologies and the adoption of common standards to integrate structural, experimental and functional data.
probeBase is a manually maintained and curated database of rRNA-targeted oligonucleotide probes and primers. Contextual information and multiple options for evaluating in silico hybridization performance against the most recent rRNA sequence databases are provided for each oligonucleotide entry, which makes probeBase an important and frequently used resource for microbiology research and diagnostics. The major features of probeBase include a classification of probes and primers according to the NCBI taxonomy database, a powerful and customizable search function, which serves to query for target organisms, probe names, primers, target sites, and references. The probeBase match tool can be used to match near-full length rRNA sequences against probeBase and find all published probes targeting the query sequences. The new proxy match tool extends this analysis to partial rRNA sequences, which exploits full-length sequences in the rRNA sequence database SILVA to find published probes potentially targeting partial query sequences. A tool for submitting new or missing probe sequences or references helps to keep probeBase up-to-date.
The FAIRDOMHub is a publicly available resource build using the SEEK software, which enables collaborations within the scientific community. FAIRDOM will establish a support and service network for European Systems Biology. It will serve projects in standardizing, managing and disseminating data and models in a FAIR manner: Findable, Accessible, Interoperable and Reusable. FAIRDOM is an initiative to develop a community, and establish an internationally sustained Data and Model Management service to the European Systems Biology community. FAIRDOM is a joint action of ERA-Net EraSysAPP and European Research Infrastructure ISBE.
The ENCODE (Encyclopedia of DNA Elements) Consortium is an international collaboration of research groups funded by the National Human Genome Research Institute (NHGRI). The goal of ENCODE is to build a comprehensive parts list of functional elements in the human genome, including elements that act at the protein and RNA levels, and regulatory elements that control cells and circumstances in which a gene is active. ENCODE results from 2007 and later are available from this project. This covers data generated during the two production phases 2007-2012 and 2013-present.
GrainGenes, a Database for Triticeae and Avena
The GrainGenes website hosts a wealth of information for researchers working on Triticeae species, oat and their wild relatives. The website hosts a database encompassing information such as genetic maps, genes, alleles, genetic markers, phenotypic data, quantitative trait loci studies, experimental protocols and publications. The database can be queried by text searches, browsing, Boolean queries, MySQL commands, or by using pre-made queries created by the curators. GrainGenes is not solely a database, but serves as an informative site for researchers and a means to communicate project aims, outcomes and a forum for discussion.
MorphoBank is a web application providing an online database and workspace for evolutionary research in systematics (the science of determining the evolutionary relationships among species). MorphoBank invites scientists producing peer-reviewed research to upload images and affiliate data with those images (labels, species names, etc.). MorphoBank also offers a platform for live collaboration on phylgoenetic matrices by teams in a private workspace where they can also affiliate images with phylogenetic matrices. MorphoBank stores digital versions of both text and image-based observations on phenotypes. Phylogenetic matrices (Nexus or TNT format), particularly phenotypical matrices, 2D (including JPEG, GIF, PNG, TIFF and Photoshop) and 3D (PLY, STL, ZIP, TIFF and DCM) image data and video (MPEG-4, QuickTime and WindowsMedia). MorphoBank also offer a Documents folder for additional files about their research such as pdfs, word documents, and text files (e.g., morphometric data, phylogenetic trees).
The Ensembl genome annotation system, developed jointly by EMBL-EBI and the Wellcome Trust Sanger Institute, has been used for the annotation, analysis and display of vertebrate genomes since 2000. Since 2009, the Ensembl site has been complemented by the creation of five new sites, for bacteria, protists, fungi, plants and invertebrate metazoa, enabling users to use a single collection of (interactive and programatic) interfaces for accessing and comparing genome-scale data from species of scientific interest from across the taxonomy.
Ensembl Bacteria is a browser for bacterial and archaeal genomes. These are taken from the databases of the International Nucleotide Sequence Database Collaboration(the European Nucleotide Archive at the EBI, GenBank at the NCBI, and the DNA Database of Japan). As of release 35 (April 2017), we have only integrated new sequences that are non-redundant when compared to the existing data set, according to the criteria of the UniProt Knowledgebase (DOI: 10.1093/database/baw139).
Ensembl Protists holds over 240 genomes of interest covering those involved in disease and of scientific interest. This includes genomes such as Plasmodium falciparum, Dictyostelium discoideum, Phytophthora infestans and Leishmania major. A majority of these genomes are taken from the databases of the International Nucleotide Sequence Database Collaboration (the European Nucleotide Archive at EMBL-EBI, GenBank at the NCBI, and the DNA Database of Japan); in some cases, the annotation has been taken directly from the websites of the data generators
Ensembl Plants holds the genomes of plants of significant interest. These range from those of agricultural importance, those which support primary research and of environmental interest. Ensembl Plants datasets are constructed in a direct collaboration with the Gramene resource. The resource holds the genomes of wheat, rice, corn and mouse ear cress amongst others.
Ensembl Fungi is a browser for fungal genomes. A majority of these are taken from the databases of the International Nucleotide Sequence Database Collaboration (the European Nucleotide Archive at the EBI, GenBank at the NCBI, and the DNA Database of Japan); in some cases, the annotation has been taken directly from the websites of the data generators. As of release 47, Ensembl Fungi contained over 1000 genomes of interest.
Ensembl Metazoa provides access to genomes of metazoans of interest in disease, environmental sciences, agriculture and economic concern. Extensive coverage exists of diptera, nematodes, lepidoptera and hymenoptera.
Hardwood Genomics Project
The Hardwood Genomics Project is a databases for expressed genes, genetic markers, genetic linkage maps, and reference populations. It provides lasting genomic and biological resources for the discovery and conservation of genes in hardwood trees for growth, adaptation and responses to environmental stresses such as drought, heat, insect pests and disease. All original sequence data is being deposited in NCBI's Sequence Read Archive and the genetic linkage maps and associated marker data will be available at the Dendrome database.
Visual Database for Organelle Genome
VDOG, Visual Database for Organelle Genome is an innovative database of the genome information in the organelles. Most of the data in VDOG are originally extracted from GeneBank, re-organized and represented.
Harvard Dataverse is a research data repository running on the open source web application Dataverse. Harvard Dataverse is fully open to the public, and allows upload and browsing of data from all fields of research, and is free for all researchers worldwide (up to 1 TB). Links to related grants, authors, software and research products are provided. Harvard Dataverse supports managed access (with an access request workflow) as well as embargoing generally and during peer review. Dataverse allows users to share, preserve, cite, explore, and analyse research data. It facilitates making data available to others, and allows you to replicate others' work more easily. Researchers, data authors, publishers, data distributors, and affiliated institutions all receive academic credit and web visibility. The Harvard Database receives support from Harvard University, public and private grants, and an emergent consortium model.
Project Tycho: Data for Health
In 2013, we released the first version of Project Tycho containing weekly case counts for 50 notifiable conditions reported by health agencies in the United States for 50 states and 1284 cities between 1888 and 2014. Over the past four years, over 3700 users have registered to use Project Tycho data for a total of 40 creative works including peer-reviewed research papers, visualizations, online applications, and newspaper articles. Project Tycho 2.0 has expanded its scope to a global level and improved standardization, following FAIR (Findable, Accessible, Interoperable, and Reusable) Data Principles where possible. Project Tycho 2.0 includes case counts for 28 additional notifiable conditions for the US and includes data for dengue-related conditions for 99 countries between 1955 and 2010, obtained from the World Health Organization and Ministries of Health. Project Tycho 2.0 datasets are represented in a standard format and include standard SNOMED-CT codes for reported conditions, ISO 3166 codes for countries and first administrative level subdivisions, and NCBI TaxonID numbers for pathogens. Metadata for Project Tycho datasets are available on the website in human-readable format, but also in machine-interpretable DATS and DataCite metadata files.
A resource providing data on bioentities and their associated ontology terms for Plant Biology. The database provides access to ontology-based annotations of genes, phenotypes and germplasms from about 90 plant species. A number of internal and external ontologies are used to annotate the biological data available from this resource.
The Open Biological and Biomedical Ontology (OBO) Foundry is a collective of ontology developers that are committed to collaboration and adherence to shared principles. The mission of the OBO Foundry is to develop a family of interoperable ontologies that are both logically well-formed and scientifically accurate. To achieve this, OBO Foundry participants voluntarily adhere to and contribute to the development of an evolving set of principles including open use, collaborative development, non-overlapping and strictly-scoped content, and common syntax and relations, based on ontology models that work well, such as the Gene Ontology (GO). The OBO Foundry is overseen by an Operations Committee with Editorial, Technical and Outreach working groups.
Metabolomic Repository Bordeaux
MeRy-B is a plant metabolomics platform allowing the storage and visualisation of Nuclear Magnetic Resonance (NMR) metabolic profiles from plants.
Genome properties is an annotation system whereby functional attributes can be assigned to a genome, based on the presence of a defined set of protein signatures within that genome. This is a reimplementation at EMBL-EBI of a resource previously hosted at JCVI.
TargetMine integrates many types of data for human, rat and mouse. Flexible queries, export of results and data analysis are available.
e-cienciaDatos is a multidisciplinary data repository that houses the scientific datasets of researchers from the public universities of the Community of Madrid and the UNED, members of the Consorcio Madroño, in order to give visibility to these data. The purpose of this repository is to ensure data preservation and to facilitate data access and reuse. e-cienciaDatos collects datasets from of each of the member universities. e-cienciaDatos offers the deposit and publication of datasets, assigning a digital object identifier DOI to each of them. The association of a dataset with a DOI will facilitate data verification, dissemination, reuse, impact and long-term access. In addition, the repository provides a standardized citation for each dataset, which contains sufficient information so that it can be identified and located, including the DOI.
DES-TOMATO is a topic-specific literature exploration system developed to allow the exploration of information related to tomato. The information provided in DES-TOMATO is obtained through the text-mining of available scientific literature, namely full-length articles in PubMed Central and titles and abstracts in PubMed.
Disbiome is a database covering microbial composition changes in different kinds of diseases. Disease names, detection methods or organism names can be used as search queries giving that return information related to the experiment (related disease/bacteria, abundancy subject/control, control type, detection method and related literature).
PathosSystems Resource Integration Center Repository
PATRIC is part of the Bacterial Bioinformatics Resource Center, and is an information system designed to support the biomedical research community’s work on bacterial infectious diseases via integration of vital pathogen information with rich data and analysis tools. PATRIC provides an interface for biologists to discover data and information and conduct comprehensive comparative genomics and other analyses. PATRIC includes over 250 000 publicly available microbial genomes and tools for comparative analysis.
The BioSample database was developed to serve as a central location in which to capture and store descriptive information about the biological source materials, or samples, used to generate experimental data in any of DDBJ’s primary data archives. Typical examples of a BioSample include a cell line, a primary tissue biopsy, an individual organism, or an environmental isolate.
Scroll for more...
This record is maintained by schoch2
National Institutes of Health (NIH), Bethesda, MD, USA (Government body)
U.S. National Library of Medicine (Government body)
National Center for Biotechnology Information (NCBI), Rockville, MD, USA (Government body) Lead