standards > model/format > DOI:10.25504/FAIRsharing.dnk0f6

ready Generic Feature Format Version 3

Abbreviation: GFF3

General Information
The Generic Feature Format Version 3 (GFF3) format was developed after earlier formats, although widely used, became fragmented into multiple incompatible dialects. The GFF3 format addresses the most common extensions to GFF, while preserving backward compatibility with previous formats. GFF3 files are nine-column, tab-delimited, plain text files. Literal use of tab, newline, carriage return, the percent (%) sign, and control characters must be encoded using RFC 3986 Percent-Encoding; no other characters may be encoded. Backslash and other ad-hoc escaping conventions that have been added to the GFF format are not allowed. The file contents may include any character in the set supported by the operating environment, although for portability with other systems, use of Latin-1 or Unicode are recommended.

How to cite this record GFF3; Generic Feature Format Version 3; DOI:; Last edited: March 5, 2020, 12:07 p.m.; Last accessed: Mar 07 2021 3:05 p.m.

This record is maintained by keilbeck

Record updated: March 5, 2020, 12:06 p.m. by The FAIRsharing Team.

Show edit history



TeSS training resources 

Python programming primer


No XSD schemas defined

Access / Retrieve Data

Conditions of Use


No publications available

Related Standards

Reporting Guidelines

No guidelines defined

Terminology Artifacts

Models and Formats

Identifier Schemas

No identifier schema standards defined


No metrics standards defined

Related Databases (51)
A genetic database for attention deficit hyperactivity disorder. ADHDgene aims to provide research community with a central genetic resource and analysis platform for ADHD, to help unveil the genetic basis of ADHD and to contribute to global mental health.

Aspergillus Genome Database
The Aspergillus Genome Database is a resource for genomic sequence data as well as gene and protein information for Aspergilli. This publicly available repository is a central point of access to genome, transcriptome and polymorphism data for the fungal research community.

Central Aspergillus Data REpository
This project aims to support the international Aspergillus research community by gathering all genomic information regarding this significant genus into one resource - The Central Aspergillus REsource (CADRE). CADRE facilitates visualisation and analyses of data using the Ensembl software suite. Much of our data has been extracted from Genbank and augmented with the consent of the original sequencing groups. This additional work has been carried out using both automated and manual efforts, with support from specific annotation projects and the general Aspergillus community.

EcoliWiki: A Wiki-based community resource for Escherichia coli
EcoliWiki is a community-based resource for the annotation of all non-pathogenic E. coli, its phages, plasmids, and mobile genetic elements.

Genetic, genomic and molecular information pertaining to the model organism Drosophila melanogaster and related sequences. This database also contains information relating to human disease models in Drosophila, the use of transgenic constructs containing sequence from other organisms in Drosophila, and information on where to buy Drosophila strains and constructs.

Fungal and Oomycete genomics resource
FungiDB is an integrated genomic and functional genomic database for the kingdom Fungi. The database integrates whole genome sequence and annotation and also includes experimental and environmental isolate sequence data. The database includes comparative genomics, analysis of gene expression, and supplemental bioinformatics analyses and a web interface for data-mining.

Integrated Microbial Genomes And Microbiomes
The Integrated Microbial Genomes (IMG/M) aims to support the annotation, analysis and distribution of microbial genome and microbiome datasets sequenced at DOE's Joint Genome Institute (JGI). It also serves as a community resource for analysis and annotation of genome and metagenome datasets in a comprehensive comparative context. The IMG data warehouse integrates genome and metagenome datasets provided by IMG users with a set of publicly available genome and metagenome datasets. IMG/M is also open to scientists worldwide for the annotation, analysis, and distribution of their own genome and microbiome datasets, as long as they agree with the IMG/M data release policy and follow the metadata requirements for integrating data into IMG/M.

modMine is an integrated web resource of data and tools to browse and search modENCODE data and experimental details, download results and access the GBrowse genome browser.

Nucleic Acid Phylogenetic Profile
NAPP (Nucleic Acids Phylogenetic Profile) enables users to retrieve RNA-rich clusters from any genome in a list of 1000+ sequenced bacterial genomes. RNA-rich clusters can be viewed separately or, alternatively, all tiles from RNA-rich clusters can be contiged into larger elements and retrieved at once as a CSV or GFF file for use in a genome browser or comparison with other predictions/RNA-seq experiments.

Simple Modular Architecture Research Tool
SMART (Simple Modular Architecture Research Tool) is a web resource providing simple identification and extensive annotation of protein domains and the exploration of protein domain architectures. It allows the identification and annotation of genetically mobile domains and the analysis of domain architectures. More than 500 domain families found in signalling, extracellular and chromatin-associated proteins are detectable. These domains are extensively annotated with respect to phyletic distributions, functional class, tertiary structures and functionally important residues. Each domain found in a non-redundant protein database as well as search parameters and taxonomic information are stored in a relational database system. User interfaces to this database allow searches for proteins containing specific combinations of domains in defined taxa.

Human disease methylation database
The human disease methylation database, DiseaseMeth is a web based resource focused on the aberrant methylomes of human diseases. Until recently, bulks of large-scale data are avaible and are increasingly grown, from which more information can be mined to gain further information towards human diseases. Our mission is to provide a curated set of methylation information datasets and tools in the human genome, to support and promote research in this area. Especially, we provide a genome-scale landscape to show human methylaton information in a scalable and flexible manner. is the home page of the parasitic nematode EST project at The Genome Institute at Washington University in St. Louis. The site was established in 2000 as a component of the NIH-NIAID grant "A Genomic Approach to Parasites from the Phylum Nematoda". While started as a project site, over the years it became a community resource dedicated to the study of parasitic nematodes.

SNPedia is a wiki resource of the functional consequences of human genetic variation as published in peer-reviewed studies. Entries are formatted to allow associations to be assigned to single genotypes as well as sets of genotypes (genosets). Curation occurs through editorial, community/user, and semi-automated processes.

The UCSC Archaeal Genome Browser
The UCSC Archaeal Genome Browser is a window on the biology of more than 100 microbial species from the domain Archaea. Basic gene annotation is derived from NCBI Genbank/RefSeq entries, with overlays of sequence conservation across multiple species, nucleotide and protein motifs, non-coding RNA predictions, operon predictions, and other types of bioinformatic analyses. In addition, we display available gene expression data (microarray or high-throughput RNA sequencing). Direct contributions or notices of publication of functional genomic data or bioinformatic analyses from archaeal research labs are very welcome.

Virus Pathogen Database and Analysis Resource
The Virus Pathogen Database and Analysis Resource (ViPR) is an integrated repository of data and analysis tools for multiple virus families, supported by the National Institute of Allergy and Infectious Diseases (NIAID) Bioinformatics Resource Centers (BRC) program. ViPR captures various types of information, including sequence records, gene and protein annotations, 3D protein structures, immune epitope locations, clinical and surveillance metadata and novel data derived from comparative genomics analysis. The database is available without charge as a service to the virology research community to help facilitate the development of diagnostics, prophylactics and therapeutics for priority pathogens and other viruses.

A comprehensive online knowledgebase for the monkey research community.

Mammalian Protein Localization Database
LOCATE is a curated database that houses data describing the membrane organization and subcellular localization of proteins from the RIKEN FANTOM4 mouse and human protein sequence set.

cis-Regulatory Element Database
The cisRED database holds conserved sequence motifs identified by genome scale motif discovery, similarity, clustering, co-occurrence and coexpression calculations. Sequence inputs include low-coverage genome sequence data and ENCODE data.

YanHuang - YH1 Genome Database
The YH database presents the entire DNA sequence of a Han Chinese individual, as a representative of Asian population. This genome, named as YH, is the start of YanHuang Project, which aims to sequence 100 Chinese individuals in 3 years.assembled based on 3.3 billion reads (117.7Gbp raw data) generated by Illumina Genome Analyzer. In total of 102.9Gbp nucleotides were mapped onto the NCBI human reference genome (Build 36) by self-developed software SOAP (Short Oligonucleotide Alignment Program), and 3.07 million SNPs were identified.

National Microbial Pathogen Data Resource
The NMPDR provided curated annotations in an environment for comparative analysis of genomes and biological subsystems, with an emphasis on the food-borne pathogens Campylobacter, Listeria, Staphylococcus, Streptococcus, and Vibrio; as well as the STD pathogens Chlamydiaceae, Haemophilus, Mycoplasma, Neisseria, Treponema, and Ureaplasma.

Tandem Repeats Database
Tandem Repeats Database (TRDB) is a public repository of information on tandem repeats in genomic DNA and contains a variety of tools for their analysis.

The Chromosome 7 Annotation Project
The objective of this project is to generate the most comprehensive description of human chromosome 7 to facilitate biological discovery, disease gene research and medical genetic applications.

RNAiDB provides access to results from RNAi interference studies in C. elegans , including images, movies, phenotypes, and graphical maps.

Ensembl creates, integrates and distributes reference datasets and analysis tools that enable genomics. Ensembl is a genome browser that supports research in comparative genomics, evolution, sequence variation and transcriptional regulation. Ensembl annotate genes, computes multiple alignments, predicts regulatory function and collects disease data.

CryptoDB serves as the functional genomics database for Cryptosporidium and related species. CryptoDB is a free, online resource for accessing and exploring genome sequence and annotation, functional genomics data, isolate sequences, and orthology profiles across organisms. It also includes supplemental bioinformatics analyses and a web interface for data-mining.

Eukaryotic Pathogen, Vector and Host Informatics Resource
The Eukaryotic Pathogen, Vector and Host Informatics Resource (VEuPathDB) focuses on eukaryotic pathogens and invertebrate vectors of infectious diseases, , encompassing data from prior resources devoted to parasitic species (EuPathDB), fungi (FungiDB) and vector species (VectorBase). While each of the taxonomic groups within this resource is supported by a taxon-specific database built upon the same infrastructure, the EuPathDB portal offers an entry point to all of these resources, and the opportunity to leverage orthology for searches across genera.

A detailed study of Giardia lamblia's genome will provide insights into an early evolutionary stage of eukaryotic chromosome organization as well as other aspects of the prokaryotic / eukaryotic divergence.

MicrosporidiaDB is one of the databases that can be accessed through the EuPathDB (; formerly ApiDB) portal, covering eukaryotic pathogens of the genera Cryptosporidium, Giardia, Leishmania, Neospora, Plasmodium, Toxoplasma, Trichomonas and Trypanosoma. While each of these groups is supported by a taxon-specific database built upon the same infrastructure, the EuPathDB portal offers an entry point to all these resources, and the opportunity to leverage orthology for searches across genera.

PlasmoDB is a genome database for the genus Plasmodium, a set of single-celled eukaryotic pathogens that cause human and animal diseases, including malaria.

ToxoDB is a free online resource that provides access to genomic and functional genomic data for Toxoplasma and related organisms. The resource contains over 30 fully sequenced and annotated genomes, with genomic sequence from multiple strains available for variant detection and copy number variation analysis. In addition to genomic sequence data, ToxoDB contains functional genomic datasets including microarray, RNAseq, proteomics, ChIP-seq, and phenotypic data. In addition, results from a number of whole-genome analyses are incorporated, including mapping to orthology clusters, which allows users to leverage phylogenetic relationships in their analyses.

TrichDB is one of the databases that can be accessed through the EuPathDB (; formerly ApiDB) portal, covering eukaryotic pathogens of the genera Cryptosporidium, Giardia, Leishmania, Neospora, Plasmodium, Toxoplasma, Trichomonas and Trypanosoma. While each of these groups is supported by a taxon-specific database built upon the same infrastructure, the EuPathDB portal offers an entry point to all these resources, and the opportunity to leverage orthology for searches across genera.

TriTrypDB is one of the databases that can be accessed through the VEuPathDB portal, covering eukaryotic pathogens of the genera Cryptosporidium, Giardia, Leishmania, Neospora, Plasmodium, Toxoplasma, Trichomonas and Trypanosoma. While each of these groups is supported by a taxon-specific database built upon the same infrastructure, the VEuPathDB portal offers an entry point to all these resources, and the opportunity to leverage orthology for searches across genera.

Annotated regulatory Binding Sites from Orthologous Promoters
ABS: A database of Annotated regulatory Binding Sites from known binding sites identified in promoters of orthologous vertebrate genes.

Drosophila Species Genomes
The D. melanogaster and eight other eukaryote model genomes, and gene predictions from several groups. Summaries of essential genome statistics include sizes, genes found and predicted, homology among genomes, phylogenetic trees of species, and comparisons of several gene predictions for sensitivity and specificity in finding new and known genes.

Maize Genetics and Genomics Database
MaizeGDB is the maize research community's central repository for genetics and genomics information.

Rat Genome Database
The Rat Genome Database stores genetic, genomic, phenotype, and disease data generated from rat research. It provides access to corresponding data for eight other species, allowing cross-species comparison. Data curation is performed both manually and via an automated pipeline, giving RGD users integrated access to a wide variety of data to support their research.

PAZAR is a software framework for the construction and maintenance of regulatory sequence data annotations; a framework which allows multiple boutique databases to function independently within a larger system (or information mall). The goal of PAZAR is to be the public repository for regulatory data.

Pseudomonas Genome DB
The Pseudomonas Genome Database is a resource for peer-reviewed, continually updated annotation for all Pseudomonas species. It includes gene and protein sequence information, as well as regulation and predicted function and annotation.

Type 1 Diabetes Database
T1DBase focuses on two research areas in type 1 diabetes (T1D): the genetics of T1D susceptibility and beta cell biology.

UniProt Knowledgebase
The UniProt Knowledgebase (UniProtKB) is the central hub for the collection of functional information on proteins, with accurate, consistent and rich annotation. In addition to capturing the core data mandatory for each UniProtKB entry (mainly, the amino acid sequence, protein name or description, taxonomic data and citation information), as much annotation information as possible is added. This includes widely accepted biological ontologies, classifications and cross-references, and clear indications of the quality of annotation in the form of evidence attribution of experimental and computational data. The UniProt Knowledgebase consists of two sections: a reviewed section containing manually-annotated records with information extracted from literature and curator-evaluated computational analysis (aka "UniProtKB/Swiss-Prot"), and an unreviewed section with computationally analyzed records that await full manual annotation (aka "UniProtKB/TrEMBL").

A Systematic Annotation Package
ASAP is a relational database and web interface developed to store, update and distribute genome sequence data and gene expression data. It was designed to facilitate ongoing community annotation of genomes and to grow with genome projects as they move from the preliminary data stage through post-sequencing functional analysis.

Poxvirus Bioinformatics Resource Center
Poxvirus Bioinformatics Resource Center has been established to provide specialized web-based resources to the scientific community studying poxviruses. This resource is no longer being maintained. For tools and data supporting virus genomics, especially related to poxviruses and other large DNA viruses, please visit the Viral Bioinformatics site maintained by our collaborator, Chris Upton: For information on virus taxonomy, please visit the ICTV web site at For updated sequence data and analytical tools, please visit

VectorBase is a web-accessible data repository for information about invertebrate vectors of human pathogens. VectorBase annotates and maintains vector genomes (as well as a number of non-vector genomes for comparative analysis) providing an integrated resource for the research community. VectorBase contains genome information for organisms such as Anopheles gambiae, a vector for the Plasmodium protozoan agent causing malaria, and Aedes aegypti, a vector for the flaviviral agents causing Yellow fever and Dengue fever. Hosted data range from genome assemblies with annotated gene features, transcript and protein expression data to population genetics including variation and insecticide-resistance phenotypes.

FlyMine is an integrated database of genomic, expression and protein data for Drosophila, Anopheles and C. elegans. Integrating data makes it possible to run sophisticated data mining queries that span domains of biological knowledge.

Genetic and Genomic Information System
GnpIS is a multispecies integrative information system dedicated to plant and fungi pests. It bridges genetic and genomic data, allowing researchers access to both genetic information (e.g. genetic maps, quantitative trait loci, association genetics, markers, polymorphisms, germplasms, phenotypes and genotypes) and genomic data (e.g. genomic sequences, physical maps, genome annotation and expression data) for species of agronomical interest. GnpIS is used by both large international projects and plant science departments at the French National Institute for Agricultural Research. It is regularly improved and released several times per year. GnpIS is accessible through a web portal and allows to browse different types of data either independently through dedicated interfaces or simultaneously using a quick search ('google like search') or advanced search (Biomart, Galaxy, Intermine) tools.

Banana Genome Hub
The Banana Genome Hub centralises databases of genetic and genomic data for the Musa acuminata crop, and is the official portal for the Musa genome resources.

Visual Database for Organelle Genome
VDOG, Visual Database for Organelle Genome is an innovative database of the genome information in the organelles. Most of the data in VDOG are originally extracted from GeneBank, re-organized and represented.
MirGeneDB is a database of microRNA genes that have been validated and annotated as described in "A Uniform System for the Annotation of Vertebrate microRNA Genes and the Evolution of the Human microRNAome".* The initial version contained 1,434 microRNA genes for human, mouse, chicken and zebrafish. Version 2.0 contains more than 10,000 genes from 45 organisms representing nearly every major metazoan group, and these microRNAs can be browsed, searched and downloaded.

Regulatory Element Database for Drosophila
REDfly is a curated collection of known Drosophila transcriptional cis-regulatory modules (CRMs) and transcription factor binding sites (TFBSs). REDfly seeks to include all experimentally verified fly regulatory elements along with their DNA sequence, their associated genes, and the expression patterns they direct.

InterMine was formed in 2002 at the University of Cambridge, originally as a Drosophila-dedicated resource, before expanding to become organism-agnostic, enabling a large range of organisations around the world to create their own InterMines. There are many instances of InterMine installations, relating to particular model organisms. These can be searched individually or via a cross-Mine search function.

Information Commons for Rice
Information Commons for Rice (IC4R) is a rice knowledgebase that incorporates rice data through multiple modules such as genome-wide expression profiles derived entirely from RNA-Seq data, resequencing-based genomic variations obtained from re-sequencing data of thousands of rice varieties, plant homologous genes covering multiple diverse plant species, post-translational modifications, rice-related literatures and gene annotations contributed by the rice research community.

Scroll for more...

Implementing Policies

This record is not implemented by any policy.


Record Maintainer