Chemical Entities of Biological Interest (ChEBI) is a free dictionary that describes 'small’ chemical compounds. These compound includes distinct synthetic or natural atoms, molecules, ions, ion pair, radicals, radical ions, complexes, conformers, etc.. These molecular entities can interact with or affect the processes of living organisms.

How to cite this record ChEBI; Chemical Entities of Biological Interest; DOI:; Last edited: Feb. 11, 2020, 10:11 a.m.; Last accessed: Apr 17 2021 4:20 p.m.

ChEBI: an open bioinformatics and cheminformatics resource.

Degtyarenko K,Hastings J,de Matos P,Ennis M
Curr Protoc Bioinformatics 2009

View Paper (PubMed) View Publication

ChEBI: a database and ontology for chemical entities of biological interest.

Degtyarenko K,de Matos P,Ennis M,Hastings J,Zbinden M,McNaught A,Alcantara R,Darsow M,Guedj M,Ashburner M
Nucleic Acids Res 2007

View Paper (PubMed) View Publication

The ChEBI reference database and ontology for biologically relevant chemistry: enhancements for 2013.

Hastings J,de Matos P,Dekker A,Ennis M,Harsha B,Kale N,Muthukrishnan V,Owen G,Turner S,Williams M,Steinbeck C
Nucleic Acids Res 2012

View Paper (PubMed) View Publication

ChEBI in 2016: Improved services and an expanding collection of metabolites.

Hastings J,Owen G,Dekker A,Ennis M,Kale N,Muthukrishnan V,Turner S,Swainston N,Mendes P,Steinbeck C
Nucleic Acids Res 2015

View Paper (PubMed) View Publication

Chemical Entities of Biological Interest: an update.

de Matos P,Alcantara R,Dekker A,Ennis M,Hastings J,Haug K,Spiteri I,Turner S,Steinbeck C
Nucleic Acids Res 2009

View Paper (PubMed) View Publication

Related Databases (46)
GlycoNAVI is a repository of data relevant to carbohydrate research. It contains a free suite of carbohydrate research tools organized by domain, including glycans, proteins, lipids, genes, diseases and samples.

BioSamples at the European Bioinformatics Institute
The BioSamples database aggregates sample information for reference samples (e.g. Coriell Cell lines) and samples for which data exist in one of the EBI's assay databases such as ArrayExpress, the European Nucleotide Archive or PRIDE. It provides links to assays and specific samples, and accepts direct submissions of sample information.

ChEMBL is an open, manually-curated, large-scale bioactivity database containing information from medicinal chemistry literature. It brings together chemical, bioactivity and genomic data to aid the translation of genomic information into effective new drugs. Information regarding the compounds tested (including their structures), the biological or physicochemical assays performed on these and the targets of these assays are recorded in a structured form, allowing users to address a broad range of drug discovery questions.

EcoliWiki: A Wiki-based community resource for Escherichia coli
EcoliWiki is a community-based resource for the annotation of all non-pathogenic E. coli, its phages, plasmids, and mobile genetic elements.

IntAct molecular interaction database
IntAct provides a freely available, open source database system and analysis tools for protein interaction data. All interactions are derived from literature curation or direct user submissions and are freely available.

Rhea is a comprehensive and non-redundant resource of expert-curated chemical and transport reactions of biological interest. Rhea can be used for enzyme annotation, genome-scale metabolic modeling and omics-related analysis. Rhea describes enzyme-catalyzed reactions covering the IUBMB Enzyme Nomenclature list as well as additional reactions, including spontaneously occurring reactions. Rhea is built on ChEBI (Chemical Entities of Biological Interest) ontology of small molecules to describe its reaction participants. Since December 2018, Rhea is the standard for enzyme annotation in UniProt.

PomBase is a model organism database that provides organization of and access to scientific data for the fission yeast Schizosaccharomyces pombe. PomBase supports genomic sequence and features, genome-wide datasets and manual literature curation as well as providing structural and functional annotation and access to large-scale data sets.

Stem Cell Discovery Engine
Comparison system for cancer stem cell analysis

TTD, Therapeutic Target Database
The Therapeutic Target Database provides information about therapeutic protein and nucleic acid targets, the targeted disease, pathway information and the corresponding drugs directed at each of these targets. Also included in this database are links to relevant databases containing information about target function, sequence, 3D structure, ligand binding properties, enzyme nomenclature and drug structure, therapeutic class, clinical development status. All information is fully referenced.

The UCSC Archaeal Genome Browser
The UCSC Archaeal Genome Browser is a window on the biology of more than 100 microbial species from the domain Archaea. Basic gene annotation is derived from NCBI Genbank/RefSeq entries, with overlays of sequence conservation across multiple species, nucleotide and protein motifs, non-coding RNA predictions, operon predictions, and other types of bioinformatic analyses. In addition, we display available gene expression data (microarray or high-throughput RNA sequencing). Direct contributions or notices of publication of functional genomic data or bioinformatic analyses from archaeal research labs are very welcome.

WikiPathways is an open, collaborative platform dedicated to the curation of biological pathways. WikiPathways was established to facilitate the contribution and maintenance of pathway information by the biology community.

The Yeast Metabolome DataBase
The Yeast Metabolome Database (YMDB) is a manually curated database of small molecule metabolites found in or produced by Saccharomyces cerevisiae (also known as Baker’s yeast and Brewer’s yeast). This database covers metabolites described in textbooks, scientific journals, metabolic reconstructions and other electronic databases.

MetaboLights is a database for metabolomics studies, their raw experimental data and associated metadata. The database is cross-species and cross-technique and it covers metabolite structures and their reference spectra as well as their biological roles and locations. MetaboLights is the recommended metabolomics repository for a number of leading journals and ELIXIR, the European infrastructure for life science information.

Golm Metabolome Database
The Golm Metabolome Database (GMD) provides gas chromatography (GC) mass spectrometry (MS) reference spectra, reference metabolite profiles and tools for one of the most widespread routine technologies applied to the large scale screening and discovery of novel metabolic biomarkers.

BindingDB database of measured binding affinities
BindingDB enables research by making a growing collection of high-quality, quantitative, protein-ligand binding data findable and usable. Funded by NIGMS/NIH.

Integrated relational Enzyme database
IntEnz is a freely available resource focused on enzyme nomenclature. IntEnz contains the recommendations of the Nomenclature Committee of the International Union of Biochemistry and Molecular Biology (NC-IUBMB) on the nomenclature and classification of enzyme-catalysed reactions.

Protein Data Bank in Europe
The Protein Data Bank in Europe (PDBe) is the European resource for the collection, organisation and dissemination of data on biological macromolecular structures. It is a founding member of the worldwide Protein Data Bank which collects, organises and disseminates data on biological macromolecular structures.

Reactome - a curated knowledgebase of biological pathways
The cornerstone of Reactome is a freely available, open source relational database of signaling and metabolic molecules and their relations organized into biological pathways and processes. The core unit of the Reactome data model is the reaction. Entities (nucleic acids, proteins, complexes, vaccines, anti-cancer therapeutics and small molecules) participating in reactions form a network of biological interactions and are grouped into pathways. Examples of biological pathways in Reactome include classical intermediary metabolism, signaling, transcriptional regulation, apoptosis and disease. Inferred orthologous reactions are available for 15 non-human species including mouse, rat, chicken, zebrafish, worm, fly, and yeast.

Extracellular Matrix Interaction Database
MatrixDB stores experimental data established by full-length proteins, matricryptins, glycosaminoglycans, lipids and cations. MatrixDB reports interactions with individual polypeptide chains or with multimers (e.g. collagens, laminins, thrombospondins) when appropriate. Multimers are treated as permanent complexes, referencing EBI identifiers when possible. Human interactions were inferred from non-human homologous interactions when available.

Rat Genome Database
The Rat Genome Database stores genetic, genomic, phenotype, and disease data generated from rat research. It provides access to corresponding data for eight other species, allowing cross-species comparison. Data curation is performed both manually and via an automated pipeline, giving RGD users integrated access to a wide variety of data to support their research.

Signaling Pathway Integrated Knowledge Engine
SPIKE (Signaling Pathway Integrated Knowledge Engine) is an interactive software environment that graphically displays biological signaling networks, allows dynamic layout and navigation through these networks, and enables the superposition of DNA microarray and other functional genomics data on interaction maps.

UniProt Knowledgebase
The UniProt Knowledgebase (UniProtKB) is the central hub for the collection of functional information on proteins, with accurate, consistent and rich annotation. In addition to capturing the core data mandatory for each UniProtKB entry (mainly, the amino acid sequence, protein name or description, taxonomic data and citation information), as much annotation information as possible is added. This includes widely accepted biological ontologies, classifications and cross-references, and clear indications of the quality of annotation in the form of evidence attribution of experimental and computational data. The UniProt Knowledgebase consists of two sections: a reviewed section containing manually-annotated records with information extracted from literature and curator-evaluated computational analysis (aka "UniProtKB/Swiss-Prot"), and an unreviewed section with computationally analyzed records that await full manual annotation (aka "UniProtKB/TrEMBL").

The Human Metabolome Database
The Human Metabolome Database (HMDB) is a database containing detailed information about small molecule metabolites found in the human body.It contains or links 1) chemical 2) clinical and 3) molecular biology/biochemistry data.

YEASTNET: A consensus reconstruction of yeast metabolism
This is a portal to the consensus yeast metabolic network as reconstructed from the genome sequence and literature. It is a highly annotated metabolic map of Saccharomyces cerevisiae S288c that is periodically updated by a team of collaborators from various research groups.

Kyoto Encyclopedia of Genes and Genomes
KEGG is a database resource for understanding high-level functions and utilities of the biological system, such as the cell, the organism and the ecosystem, from molecular-level information, especially large-scale molecular datasets generated by genome sequencing and other high-throughput experimental technologies.

SIGnaling Network Open Resource
SIGNOR, the SIGnaling Network Open Resource, organizes and stores in a structured format signaling information published in the scientific literature. The captured information is stored as binary causative relationships between biological entities and can be represented graphically as activity flow. The entire network can be freely downloaded and used to support logic modeling or to interpret high content datasets. Each relationship is linked to the literature reporting the experimental evidence. In addition each node is annotated with the chemical inhibitors that modulate its activity. The signaling information is mapped to the human proteome even if the experimental evidence is based on experiments on mammalian model organisms.

SwissLipids is an expert-curated resource that provides a framework for the integration of lipid and lipidomic data with biological knowledge and models. SwissLipids is updated daily.

ENCODE Project
The ENCODE (Encyclopedia of DNA Elements) Consortium is an international collaboration of research groups funded by the National Human Genome Research Institute (NHGRI). The goal of ENCODE is to build a comprehensive parts list of functional elements in the human genome, including elements that act at the protein and RNA levels, and regulatory elements that control cells and circumstances in which a gene is active. ENCODE results from 2007 and later are available from this project. This covers data generated during the two production phases 2007-2012 and 2013-present.

Free knowledge database project hosted by Wikimedia and edited by volunteers.

DrugCentral is online drug information that provides information on active ingredients, chemical entities, pharmaceutical products, drug mode of action, indications, and pharmacologic mode of action. DrugCentral monitors FDA, EMA, and PMDA for new drug approval on regular basis to ensure currency of the resource. This resource was created and is maintained by the Division of Translational Informatics at the University of New Mexico School of Medicine.

Target Central Resource Database
TCRD is the central resource behind the Illuminating the Druggable Genome Knowledge Management Center (IDG-KMC). TCRD contains information about human targets, with special emphasis on four families of targets that are central to the NIH IDG initiative: GPCRs, kinases, ion channels and nuclear receptors. Olfactory GPCRs (oGPCRs) are treated as a separate family. A key aim of the KMC is to classify the development/druggability level of targets. The official public portal for TCRD is Pharos ( Based on modern web design principles the Pharos interface provides facile access to all data types collected by the KMC. Given the complexity of the data surrounding any target, efficient and intuitive visualization has been a high priority, to enable users to quickly navigate & summarize search results and rapidly identify patterns. A critical feature of the interface is the ability to perform flexible search and subsequent drill down of search results. Underlying the interface is a RESTful API that provides programmatic access to all KMC data, allowing for easy consumption in user applications.

Enzyme Portal
The Enzyme Portal is for those interested in the biology of enzymes and proteins with enzymatic activity. It integrates publicly available information about enzymes, such as small-molecule chemistry, biochemical pathways and drug compounds. It contains enzyme-related information from resources developed at the EBI, and presents it via a unified user experience. The Enzyme Portal team does not curate enzyme information and therefore is a secondary information resource or portal.

A resource providing data on bioentities and their associated ontology terms for Plant Biology. The database provides access to ontology-based annotations of genes, phenotypes and germplasms from about 90 plant species. A number of internal and external ontologies are used to annotate the biological data available from this resource.

Alliance of Genome Resources
The primary mission of the Alliance of Genome Resources (the Alliance) is to develop and maintain sustainable genome information resources that facilitate the use of diverse model organisms in understanding the genetic and genomic basis of human biology, health and disease.

Patient-Derived tumor Xenograft Finder
PDX Finder is an open repository for the upload and storage of clinical, genomic and functional Patient-Derived Xenograph (PDX) data which provides a comprehensive global catalogue of PDX models available for researchers across distributed repository databases. Integrated views are provided for histopathological image data, molecular classification of tumors, host mouse strain metadata, tumor genomic data and metrics on tumor response to chemotherapeutics. The data model for PDX Finder is based on the minimal information standard for PDX models developed in collaboration with a broad range of stakeholders who create and/or use PDX models in basic and pre-clinical cancer research.

OBO Foundry
The Open Biological and Biomedical Ontology (OBO) Foundry is a collective of ontology developers that are committed to collaboration and adherence to shared principles. The mission of the OBO Foundry is to develop a family of interoperable ontologies that are both logically well-formed and scientifically accurate. To achieve this, OBO Foundry participants voluntarily adhere to and contribute to the development of an evolving set of principles including open use, collaborative development, non-overlapping and strictly-scoped content, and common syntax and relations, based on ontology models that work well, such as the Gene Ontology (GO). The OBO Foundry is overseen by an Operations Committee with Editorial, Technical and Outreach working groups.

Bacterial Diversity Metadatabase
BacDive—the Bacterial Diversity Metadatabase merges detailed strain-linked information on the different aspects of bacterial and archaeal biodiversity. BacDive contains entries for over 63,000 strains and provides information on their taxonomy, morphology, physiology, sampling and concomitant environmental conditions as well as molecular biology.

Complex Portal
The Complex Portal is a manually curated, encyclopaedic resource of macromolecular complexes from a number of key model organisms. The majority of complexes are made up of proteins but may also include nucleic acids or small molecules. All data is freely available for search and download.

TargetMine integrates many types of data for human, rat and mouse. Flexible queries, export of results and data analysis are available.

DISNOR is a resource that uses a comprehensive collection of disease associated genes, as annotated in DisGeNET, to interrogate SIGNOR ( in order to assemble disease-specific logic networks linking disease associated genes by causal relationships. DISNOR is an open resource where more than 4000 disease-networks, linking ~ 2800 disease genes, can be explored. For each disease curated in DisGeNET, DISNOR links disease genes through manually annotated causal relationships and the inferred 'patho-pathways' can be visualised at different level of complexity.

Signaling Pathways Project
The Signaling Pathways Project is an integrated 'omics knowledgebase based upon public, manually curated transcriptomic and cistromic (ChIP-Seq) datasets involving genetic and small molecule manipulations of cellular receptors, enzymes and transcription factors. Our goal is to create a resource where scientists can routinely generate research hypotheses or validate bench data relevant to cellular signaling pathways.

MolMeDB: Molecules on Membranes Database
MolMeDB is an open chemistry database concerning the interaction of molecules with membranes.

DES-TOMATO is a topic-specific literature exploration system developed to allow the exploration of information related to tomato. The information provided in DES-TOMATO is obtained through the text-mining of available scientific literature, namely full-length articles in PubMed Central and titles and abstracts in PubMed.

DNA Modification Database
DNAmod is an open-source database ( that catalogues DNA modifications and provides a single source to learn about their properties. The database annotates the chemical properties and structures of all curated modified DNA bases, and a much larger list of candidate chemical entities. DNAmod includes manual annotations of available sequencing methods, descriptions of their occurrence in nature, and provides existing and suggested nomenclature. DNAmod enables researchers to rapidly review previous work, select mapping techniques, and track recent developments concerning modified bases of interest.

NERC Vocabulary Server
The NERC Vocabulary Server provides access to standardised lists of terms and taxonomies related to a wide range of concepts which are used to facilitate data markup, interoperability and discovery in the marine and associated earth science domains. Some of these vocabularies are totally managed by BODC, others are managed by BODC on behalf of other organisations, while some are owned and, when relevant, managed by external governance authorities.

Pharmacokinetics Database
PK-DB an open database for pharmacokinetics information from clinical trials as well as pre-clinical research. The focus of PK-DB is to provide high-quality pharmacokinetics data enriched with the required meta-information for computational modeling and data integration.

Grant Number(s)

  • 021902 (European Commision grants FELICS)

  • 226073 (European Union, Serving Life-science Information for the Next Generation (SLING))

  • BB/G022747/1 (Biotechnology and Biological Sciences Research Council (BBSRC), UK)

  • BB/K019783/1 (Biotechnology and Biological Sciences Research Council (BBSRC), UK)