Sequence Ontology
Abbreviation: SO
Homepage http://www.sequenceontology.org/
Countries that developed this resource Worldwide
Created in 2004
Taxonomic range
Knowledge Domains
Subjects
How to cite this record FAIRsharing.org: SO; Sequence Ontology; DOI: https://doi.org/10.25504/FAIRsharing.6bc7h9; Last edited: Feb. 4, 2020, 11 a.m.; Last accessed: Jan 21 2021 6:20 p.m.
Publication for citation The Sequence Ontology: a tool for the unification of genome annotations. Eilbeck K, Lewis SE, Mungall CJ, Yandell M, Stein L, Durbin R, Ashburner M.; Genome Biology ; 2005; 10.1186/gb-2005-6-5-r44;
Record updated: Feb. 4, 2020, 11 a.m. by The FAIRsharing Team.
forum | GitHub Issue Tracker |
Mailing List | SO Mailing list |
online documentation | SO Wiki Pages |
online documentation | GitHub Repository |
Additional Information
Contact | Karen Eilbeck ORCID |
External Links
Bioportal | http://bioportal.bioontology.org/ontologies/SO |
OBO | http://obofoundry.org/ontology/so.html |
Genome Annotation Library | https://github.com/The-Sequence-Ontology/GAL | 0.01 |
Sequence Ontology Bioinformatics Analysis | https://github.com/The-Sequence-Ontology/SOBA |
No XSD schemas defined
Conditions of Use
Applies to: Data useData Release
Download (OWL) | http://purl.obolibrary.org/obo/so.owl |
Data Curation
Request a term | https://github.com/The-Sequence-Ontology/SO-Ontologies/issues |
Data Access
Search and Browse SO | http://www.sequenceontology.org/browser/obob.cgi |
AgroPortal: Browse / Download | http://agroportal.lirmm.fr/ontologies/SO |
The Sequence Ontology: a tool for the unification of genome annotations.
Eilbeck K, Lewis SE, Mungall CJ, Yandell M, Stein L, Durbin R, Ashburner M.
Genome Biology 2005
Evolution of the Sequence Ontology terms and relationships.
Mungall CJ, Batchelor C, Eilbeck K.
J Biomed Inform. 2010
A standard variation file format for human genome sequences.
Reese MG, Moore B, Batchelor C, Salas F, Cunningham F, Marth GT, Stein L, Flicek P, Yandell M, Eilbeck K.
Genome Biol. 2010
View in BioPortal.
View in OBO Foundry.
Reporting Guidelines
Terminology Artifacts
- Expressed Sequence Annotation for Humans
- Ontology for Parasite LifeCycle
- Neuroscience Information Framework Standard Ontology
- ImMunoGeneTics Ontology
- Ontology for MicroRNA Target
- HGNC Gene Symbols, Gene Names and IDs
- Ontology for Genetic Interval
- PRotein Ontology
- Fungal Gross Anatomy Ontology
- Feature Annotation Location Description Ontology
- Influenza Ontology
- Fission Yeast Phenotype Ontology
- Single Nucleotide Polymorphism Ontology
- Variation Ontology
- Bacterial interlocked Process ONtology
- Non-coding RNA Ontology
- Genomic Feature and Variation Ontology
- Proteasix Ontology
- Domain Resource Application Ontology
Models and Formats
Identifier Schemas
No identifier schema standards defined
Metrics
No metrics standards defined
Community-based resource for the annotation of all non-pathogenic E. coli, its phages, plasmids, and mobile genetic elements.
FlyBase
Genetic, genomic and molecular information pertaining to the model organism Drosophila melanogaster and related sequences. This database also contains information relating to human disease models in Drosophila, the use of transgenic constructs containing sequence from other organisms in Drosophila, and information on where to buy Drosophila strains and constructs.
Fungal and Oomycete genomics resource
FungiDB is an integrated genomic and functional genomic database for the kingdom Fungi. The database integrates whole genome sequence and annotation and also includes experimental and environmental isolate sequence data. The database includes comparative genomics, analysis of gene expression, and supplemental bioinformatics analyses and a web interface for data-mining.
modMine
modMine is an integrated web resource of data and tools to browse and search modENCODE data and experimental details, download results and access the GBrowse genome browser.
Saccharomyces Genome Database
The Saccharomyces Genome Database (SGD) collects and organizes information about the molecular biology and genetics of the yeast Saccharomyces cerevisiae. SGD contains a variety of biological information and tools with which to search and analyze it.
The Arabidopsis Information Resource
The Arabidopsis Information Resource (TAIR) maintains a database of genetic and molecular biology data for the model higher plant Arabidopsis thaliana.
PomBase
PomBase is a model organism database that provides organization of and access to scientific data for the fission yeast Schizosaccharomyces pombe. PomBase supports genomic sequence and features, genome-wide datasets and manual literature curation as well as providing structural and functional annotation and access to large-scale data sets.
Stem Cell Discovery Engine
Comparison system for cancer stem cell analysis
The UCSC Archaeal Genome Browser
The UCSC Archaeal Genome Browser is a window on the biology of more than 100 microbial species from the domain Archaea. Basic gene annotation is derived from NCBI Genbank/RefSeq entries, with overlays of sequence conservation across multiple species, nucleotide and protein motifs, non-coding RNA predictions, operon predictions, and other types of bioinformatic analyses. In addition, we display available gene expression data (microarray or high-throughput RNA sequencing). Direct contributions or notices of publication of functional genomic data or bioinformatic analyses from archaeal research labs are very welcome.
WormBase
WormBase is an international consortium of biologists and computer scientists dedicated to providing the research community with accurate, current, accessible information concerning the genetics, genomics and biology of C. elegans and related nematodes.
Gramene: A curated, open-source, integrated data resource for comparative functional genomics in plants
Gramene's purpose is to provide added value to plant genomics data sets available within the public sector, which will facilitate researchers' ability to understand the plant genomes and take advantage of genomic sequence known in one species for identifying and understanding corresponding genes, pathways and phenotypes in other plant species. It represents a broad spectrum of species ranging from unicellular photo-autotrophs, algae, monocots, dicots and other important taxonomic clades. Within Plant Reactome, a database portal of Gramene, there are over 60 plant genomes as well as pathways for more than 80 species.
Daphnia Water Flea Genome Database
wFleaBase includes data from all species of the genus, yet the primary species are Daphnia pulex and Daphnia magna, because of the broad set of genomic tools that have already been developed for these animals.
Mouse Genome Database - a Mouse Genome Informatics (MGI) Resource
MGI is the international database resource for the laboratory mouse, providing integrated genetic, genomic, and biological data to facilitate the study of human health and disease. Data includes gene characterization, nomenclature, mapping, gene homologies among mammals, sequence links, phenotypes, allelic variants and mutants, and strain data.
The Zebrafish Information Network
The Zebrafish Information Network, ZFIN, serves as the primary community database resource for the laboratory use of zebrafish. We develop and support integrated zebrafish genetic, genomic, developmental and physiological information and link this information extensively to corresponding data in other model organism and human databases.
VectorBase
VectorBase is a web-accessible data repository for information about invertebrate vectors of human pathogens. VectorBase annotates and maintains vector genomes providing an integrated resource for the research community. Currently, VectorBase contains genome information for 38 organisms including Anopheles gambiae, a vector for the Plasmodium protozoan agent causing malaria, and Aedes aegypti, a vector for the flaviviral agents causing Yellow fever and Dengue fever. Recent additions include large scale variant (SNP) datasets and population genetics data (genotype/phenotype).
European Variation Archive
The European Variation Archive is an open-access archive that accepts submission of, and provides access to, all types of genetic variation data from all species. All users are able to download any dataset, or query our study catalogue via our variation table. Access to EVA data is also provided by RESTful web services for a variety of applications, such as annotation pipelines.
FlyMine
FlyMine is an integrated database of genomic, expression and protein data for Drosophila, Anopheles and C. elegans. Integrating data makes it possible to run sophisticated data mining queries that span domains of biological knowledge.
MouseMine @ MGI
A database of integrated mouse data from MGI, powered by InterMine. MouseMine is member of InterMOD, a consortium of model organism databases dedicated to making cross-species data analysis easier through ongoing coordination and collaborative system development.
ClinVar
ClinVar is a freely accessible, public archive of reports of the relationships among human variations and phenotypes, with supporting evidence. ClinVar thus facilitates access to and communication about the relationships asserted between human variation and observed health status, and the history of that interpretation. ClinVar processes submissions reporting variants found in patient samples, assertions made regarding their clinical significance, information about the submitter, and other supporting data. The alleles described in submissions are mapped to reference sequences, and reported according to the HGVS standard. ClinVar then presents the data for interactive users as well as those wishing to use ClinVar in daily workflows and other local applications. ClinVar works in collaboration with interested organizations to meet the needs of the medical genetics community as efficiently and effectively as possible.
ENCODE Project
The ENCODE (Encyclopedia of DNA Elements) Consortium is an international collaboration of research groups funded by the National Human Genome Research Institute (NHGRI). The goal of ENCODE is to build a comprehensive parts list of functional elements in the human genome, including elements that act at the protein and RNA levels, and regulatory elements that control cells and circumstances in which a gene is active. ENCODE results from 2007 and later are available from this project. This covers data generated during the two production phases 2007-2012 and 2013-present.
Ascidian Network for In Situ Expression and Embryological Data
Aniseed is a database designed to offer a representation of ascidian embryonic development at the level of the genome (cis-regulatory sequences, spatial gene expression, protein annotation), of the cell (cell shapes, fate, lineage) or of the whole embryo (anatomy, morphogenesis).
dictyBase
dictyBase is a single-access database for the complete genome sequence and expression data of four Dictyostelid species providing information on research, genome and annotations. There is also a repository of plasmids and strains held at the Dicty Stock Centre. Relevant literature is integrated into the database, and gene models and functional annotation are manually curated from experimental results and comparative multigenome analyses.
Comprehensive Antibiotic Resistance Database
A bioinformatic database of antimicrobial resistance genes, their products and associated phenotypes.
Open Targets
Open Targets is a data integration platform for access to and visualisation of potential drug targets associated with disease. Each drug target is linked to a disease using integrated genome-wide data from a broad range of data sources.
Rfam
The Rfam database is a collection of RNA families, each represented by multiple sequence alignments, consensus secondary structures and covariance models (CMs). The families in Rfam break down into three broad functional classes: non-coding RNA genes, structured cis-regulatory elements and self-splicing RNAs. Typically these functional RNAs often have a conserved secondary structure which may be better preserved than the RNA sequence.
Hardwood Genomics Project
The Hardwood Genomics Project is a databases for expressed genes, genetic markers, genetic linkage maps, and reference populations. It provides lasting genomic and biological resources for the discovery and conservation of genes in hardwood trees for growth, adaptation and responses to environmental stresses such as drought, heat, insect pests and disease. All original sequence data is being deposited in NCBI's Sequence Read Archive and the genetic linkage maps and associated marker data will be available at the Dendrome database.
Target Pathogen
The Target-Pathogen database is a bioinformatic approach to prioritize drug targets in pathogens. Available genomic data for pathogens has created new opportunities for drug discovery and development, including new species, resistant and multiresistant ones. However, this data must be cohesively integrated to be fully exploited and be easy to interrogate. Target-Pathogen has been designed and developed as an online resource to allow genome wide based data consolidation from diverse sources focusing on structural druggability, essentiality and metabolic role of proteins. By allowing the integration and weighting of this information, this bioinformatic tool aims to facilitate the identification and prioritization of candidate drug targets for pathogens. With the structurome and drugome information Target-Pathogen is a unique resource to analyze whole genomes of relevants pathogens.
CottonGen
CottonGen is a cotton community genomics, genetics and breeding database being developed to enable basic, translational and applied research in cotton. It is being built using the open-source Tripal database infrastructure. CottonGen supercedes CottonDB and the Cotton Marker Database, with enhanced tools for easier data sharing, mining, visualization and data retrieval of cotton research data. CottonGen contains annotated whole genome sequences, unigenes from expressed sequence tags (ESTs), markers, trait loci, genetic maps, genes, taxonomy, germplasm, publications and communication resources for the cotton community. Annotated whole genome sequences of Gossypium raimondii are available with aligned genetic markers and transcripts.
OBO Foundry
The Open Biological and Biomedical Ontology (OBO) Foundry is a collective of ontology developers that are committed to collaboration and adherence to shared principles. The mission of the OBO Foundry is to develop a family of interoperable ontologies that are both logically well-formed and scientifically accurate. To achieve this, OBO Foundry participants voluntarily adhere to and contribute to the development of an evolving set of principles including open use, collaborative development, non-overlapping and strictly-scoped content, and common syntax and relations, based on ontology models that work well, such as the Gene Ontology (GO). The OBO Foundry is overseen by an Operations Committee with Editorial, Technical and Outreach working groups.
Citrusgreening.org
Huanglongbing (HLB) is a tritrophic disease complex involving citrus host trees, the Asian citrus psyllid (ACP) insect and a phloem restricted, bacterial pathogen Candidatus Liberibacter asiaticus (CLas). HLB is considered to be the most devastating of all citrus diseases, and there is currently no adequate control strategy. Citrusgreening.org is a database for host, vector and pathogen involved in citrus greening disease.
Bovine Genome Database
The Bovine Genome Database project is developed to support the efforts of bovine genomics researchers by providing data mining, genome navigation and annotation tools for the bovine reference genome based on the hereford cow, L1 Dominette 01449.
DatabasE of genomiC varIation and Phenotype in Humans using Ensembl Resources
DECIPHER (Database of Chromosomal Imbalance and Phenotype in Humans Using Ensembl Resources) is an interactive web-based resource that incorporates a suite of tools designed to aid the interpretation of submicroscopic chromosomal imbalance, inversions, and translocations.
LNCipedia
LNCipedia is a database for human long non-coding RNA (lncRNA) transcripts and genes. In addition to basic transcript information and gene structure, several statistics are determined for each entry in the database, such as secondary structure information, protein coding potential and microRNA binding sites. Available literature on specific lncRNAs is linked, and users or authors can submit articles through a web interface. LNCipedia is publicly available and allows users to query and download lncRNA sequences and structures based on different search criteria.
Database of Genomic Variants
The Database of Genomic Variants (DGV) is a publicly accessible, comprehensive curated catalogue of structural variation (SV) found in the genomes of control individuals from worldwide populations.
BovineMine
BovineMine integrates the bovine reference genome assembly with many other biological data sets, including genomes of other species. The sheep and goat genomes allow comparison across ruminants. Model organism data (human, mouse, rat) allow well-curated data sets to be applied to ruminants using orthology.
CHOmine
CHOmine integrates many types of data for Cricetulus griseus, and CHO cells. You can run flexible queries, export results and analyse lists of data.
Agronomic Linked Data
The Agronomic Linked Data (AgroLD) is a knowledge-based system relying on Semantic Web technologies and exploiting standard domain ontologies, which integrates data about plant species of high interest for the plant science community. AgroLD is an RDF knowledge base of 100M triples created by annotating and integrating more than 50 datasets from 10 data sources and linked using 10 ontologies.
InterMine
InterMine was formed in 2002 at the University of Cambridge, originally as a Drosophila-dedicated resource, before expanding to become organism-agnostic, enabling a large range of organisations around the world to create their own InterMines. There are many instances of InterMine installations, relating to particular model organisms. These can be searched individually or via a cross-Mine search function.
Scroll for more...
This record is not implemented by any policy.
Record Maintainer
This record is maintained by keilbeck
Funds
U.S. National Library of Medicine (Government body)
National Human Genome Research Institute (NHGRI), Bethesda, MD, USA (Government body)
Maintains
SO administrators (Consortium) Lead
Department of Human Genetics, University of Utah, USA (University)
Grant Number(s)
1RC2HG005619 (National Human Genome Research Institute (NHGRI), Bethesda, MD, USA)
2R44HG002991 (National Human Genome Research Institute (NHGRI), Bethesda, MD, USA)
2R44HG003667 (National Human Genome Research Institute (NHGRI), Bethesda, MD, USA)
5R01HG004341 (National Human Genome Research Institute (NHGRI), Bethesda, MD, USA)
HG004341 (National Human Genome Research Institute (NHGRI), Bethesda, MD, USA)
P41HG002273 (National Human Genome Research Institute (NHGRI), Bethesda, MD, USA)