Skip Navigation



Briefings in Functional Genomics and Proteomics Advance Access published online on October 29, 2007

Briefings in Functional Genomics and Proteomics, doi:10.1093/bfgp/elm025
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow All Versions of this Article:
6/3/202    most recent
elm025v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Spudich, G.
Right arrow Articles by Birney, E.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Spudich, G.
Right arrow Articles by Birney, E.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© Oxford University Press, 2007, All rights reserved. For permissions, please email: journals.permissions@oxfordjournals.org

Genome browsing with Ensembl: a practical overview

Giulietta Spudich, Xosé M. Fernández-Suárez and Ewan Birney

Corresponding author. Ewan Birney, EMBL Outstation – EBI, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK Tel: +44(0)1223 494992 Fax: +44(0)1223 494468; E-mail: birney{at}ebi.ac.uk


    ABSTRACT
 TOP
 ABSTRACT
 OVERVIEW
 INTRODUCTION TO THE ENSEMBL...
 THE ENSEMBL GENE SET
 ORTHOLOGY/PARALOGY PREDICTION
 DATA ACCESS
 THE BROWSER: OVERVIEW AND...
 SUMMARY
 FUTURE GOALS
 References
 
A wealth of gene information is accruing in public databases. Genome browsers such as Ensembl are needed to organize and depict this information in the context of the genome. Ensembl provides an open source gene set based on experimental evidence for over 30 species, the majority of which are vertebrates. Genes and annotation are accessible through the Ensembl browser (http://www.ensembl.org), and through direct queries of its databases using the Perl API (Application Programme Interface), MySQL or BioMart.

Keywords: Ensembl, genome browser, annotation, data mining, gene prediction, comparative genomics


    OVERVIEW
 TOP
 ABSTRACT
 OVERVIEW
 INTRODUCTION TO THE ENSEMBL...
 THE ENSEMBL GENE SET
 ORTHOLOGY/PARALOGY PREDICTION
 DATA ACCESS
 THE BROWSER: OVERVIEW AND...
 SUMMARY
 FUTURE GOALS
 References
 
This article presents a general introduction to Ensembl focusing on the basis for the gene set, data upload using DAS, data access and comparative genomics. The introductory sections are followed by a series of modules describing practical aspects of using the browser. The specific modules and what they cover are as follows:

Module 1: How to view information for one gene (including biological basis for the gene prediction and an introduction to viewing external data with DAS)?

Module 2: How to view a region of the chromosome and associated annotation?

Module 3: How to view SNPs and other variations?

Module 4: How to view homologies and alignments?

Module 5: How to query the Ensembl database with BioMart?


    INTRODUCTION TO THE ENSEMBL PROJECT
 TOP
 ABSTRACT
 OVERVIEW
 INTRODUCTION TO THE ENSEMBL...
 THE ENSEMBL GENE SET
 ORTHOLOGY/PARALOGY PREDICTION
 DATA ACCESS
 THE BROWSER: OVERVIEW AND...
 SUMMARY
 FUTURE GOALS
 References
 
The Ensembl project (Figure 1) [1–4] aims to provide an up-to-date gene set with associated annotation on the most recent assemblies for vertebrate species (including human [5, 6], mouse [7], rat [8] and zebrafish [9]) along with several model organisms commonly used in scientific studies (including yeast [10], Caenorhabditis elegans [11] and fruit fly [12]). Ensembl stands next to the genome browsers of UCSC (University of California, Santa Cruz, http://genome.ucsc.edu/) and NCBI (National Center for Biotechnology Information, http://www.ncbi.nlm.nih.gov/mapview/) as one of the three most-used genome browsers in the world today. In Ensembl version 43 http://feb2007.archive.ensembl.org/index.html, 31 annotated genomes were included in Ensembl (Table 1). At the time of publication (September, 2007), the number has increased to 35. The project is open source and provides a free and comprehensive resource for both the research community and scientific industry. Ensembl is used by a broad range of scientists, from geneticists to molecular biologists to bioinformaticians worldwide. This meets Ensembl's original aim to provide a free and comprehensive resource to the scientific community. Ensembl was established in 1999 as a joint project between the EBI (EMBL) and the Wellcome Trust Sanger Institute, with additional funding from NIH–NIAID, EU, BBSRC and MRC.


Figure 1
View larger version (53K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Figure 1: The main page of Ensembl at www.ensembl.org. The homepage provides a list of all species in the Ensembl browser, along with assembly information. Links to help and documentation and tools such as BLAST and BioMart are provided, along with news for each release, and a search function. A tab marks the link to the Pre! site.

 

View this table:
[in this window]
[in a new window]

 
Table 1: Genomes available in Ensembl as of February 2007 (version 43), along with date and source of the ‘Gene Build’ and the assembly used. Species marked by asterisk are those with new sequence assemblies that have not yet been fully annotated (in the ‘Pre! site’, accessible by a tab in the main page).

 
All Ensembl gene predictions are based on biological evidence, specifically mRNA and protein evidence from public databases such as UniProt [13] and RefSeq [14]. See the section on the ‘Ensembl gene set’ in this article. Associated annotation ranges from sequence variation to functional classes on the protein level and includes SNPs (single nucleotide polymorphisms), in-dels (insertion–deletion mutations), clone sets, protein domains and functional classes such as GO (Gene Ontology [15]) terms. Expression data from the GNF (Genomics Institute of the Novartis Research Foundation) [16] project and eVOC (Expressed Sequence Annotation for Humans) ontologies [17] can be accessed from the database for human. These projects involve the determination of the location (tissue type/organ) and timing of gene expression. For the remaining species, expression data can only be inferred from homologies. Ensembl genes are trackable throughout scientific databases as they are mapped to external identifiers in databases such as UniProt and RefSeq (sequence repositories across species), and MGD (Mouse Genome Database: a sequence repository for mouse) [18] or probeset IDs from Affymetrix (GeneChip®) [19] or Illumina® (BeadArrays) [20]. In this way, Ensembl expands beyond its own databases to other databases and literature in the scientific community.


    THE ENSEMBL GENE SET
 TOP
 ABSTRACT
 OVERVIEW
 INTRODUCTION TO THE ENSEMBL...
 THE ENSEMBL GENE SET
 ORTHOLOGY/PARALOGY PREDICTION
 DATA ACCESS
 THE BROWSER: OVERVIEW AND...
 SUMMARY
 FUTURE GOALS
 References
 
Ensembl strives to provide the most accurate and up-to-date gene set possible. If available, manually curated datasets are imported, such as the SGD (Saccharomyces Genome Database [21]) gene set for Saccharomyces cerevisiae, the WormBase [22] gene set for C. elegans, and the VEGA/Havana [23] set for Homo sapiens. The VEGA (vertebrate genome annotation) consortium [23] provides manual annotation of vertebrate genomes, focusing on regions in human, mouse, zebrafish, pig and dog. For species where manually curated evidence is not available, Ensembl annotates the gene set using a gene prediction pathway (or annotation pipeline). This is termed as the genebuild [24], which determines the Ensembl gene set using biological evidence, namely mRNA and protein information in databases such as UniProt/Swiss-Prot and annotated entries in RefSeq. Every resulting gene is based on at least one mRNA or protein, and in most cases, one Ensembl gene has been determined using multiple pieces of evidence from comprehensive biological databases.

The Ensembl annotation pipeline is carefully followed by the genebuild team [25]. A typical genebuild is performed over weeks, resulting in the Ensembl gene set of ‘known’ and ‘novel’ genes for a species. Two stages are applied in the full genebuild: the ‘targeted’ stage and the ‘similarity’ stage [26]. In the first stage, mRNA and proteins from the same species are aligned to the assembly for that species, using the GeneWise [27] algorithm. This results in the ‘known’ Ensembl genes, which match genes with IDs in public databases for the same species. ‘Novel’ genes result from the ‘similarity build’ in which mRNA from the same species that were not aligned in the ‘targetted’ stage and proteins from both the same and closely related species are aligned to the assembly using GeneWise and Exonerate [28]. A separate approach is used for low-coverage assemblies. This genebuild is based on projection of human genes based on a whole gene alignment between the low coverage genome and the human. Specific information about the genebuild and assembly used for each species is found on the species-specific index page, along with species-specific news for the current release (for example, for human: Figure 2). For some organisms, like Drosophila melanogaster, genes are imported and matched to the assembly, in this case from FlyBase [29], a manually curated database of high quality. Yeast genes are imported from SGD to provide a simple eukaryote for comparative genomics studies. The evidence used to build each Ensembl gene can be readily viewed in the ‘supporting evidence panel’ of ExonView (see Figure 11, Module 1).


Figure 2
View larger version (51K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Figure 2: The human index page. The index page for each species provides information about the assembly (A), gene build (B) and new features in the latest release (C). Statistics of the gene build are shown in the lower right-hand corner (B) including date of the genebuild, and number of resulting genes and types (for example, pseudogenes, RNA genes, known and novel protein-coding genes).

 

Figure 11
View larger version (31K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Figure 11: The supporting evidence for the Ensembl HFE transcript ENST00000349999 in the ExonView page. The mRNA and protein evidence used by the Ensembl annotation pipeline to make the HFE ‘ENST00000349999’ prediction. Dark green boxes represent exons supported by the evidence. Empty boxes represent exons not supported by an entry. Links are provided to the identifiers containing sequence, submission and database information. This panel is at the bottom of ‘ExonView’ (obtainable by clicking on ‘Exon information’ in the navigation column at the left of an Ensembl page).

 
The Ensembl genebuild procedure takes into account species-specific characteristics, such as duplicated genes in the teleosts [such as Danio rerio (zebrafish)] arising from a potential ancestral duplication in the genome [30, 31]. Ensembl strives to assess the quality of its gene set predicted from the annotation pipeline using experimental data. For the chicken genome, experimental methods relying on RT–PCR analysis showed Ensembl exon predictions to be 92% correct, with 94% splice junctions correctly predicted by Ensembl [32]. Ensembl compares genes predicted in the annotation pipeline to those determined by manual curators (VEGA [23]). This provides a quality check of the predicted gene set. Ensembl is a member of the CCDS consortium http://www.ensembl.org/Homo_sapiens/ccds.html, in which NCBI (National Center for Biotechnology Information), WTSI (Wellcome Trust Sanger Institute), UCSC (University of California at Santa Cruz) and the EBI (European Bioinformatics Institute) aim to agree upon consensus coding sequences using one assembly, to obtain the most reliable set of genes possible. The CCDS set is available for human and mouse.

Annotation of pseudogenes is available for most species (human, mouse, rat, zebrafish, C. elegans, cow, armadillo, etc.). Ensembl transcripts that appear twice in the genome are tested for the features of processed pseudogenes: specifically a lack of introns and the presence of a poly(A) tail.

EST evidence is used to predict a separate set of Ensembl genes (Ensembl EST genes). These can be seen in ‘ContigView’ (Module 2). EST information can support an Ensembl protein-coding gene prediction, especially if there is an overlap of an Ensembl EST gene prediction and a gene from the genebuild, as these two gene sets come from two different lines of evidence. ESTs have been used to determine various splice isoforms of a gene [33], and the Ensembl EST gene set can be extracted from the database using the Perl API (information available at: http://www.ensembl.org/info/software/index.html), or by direct query of the ‘otherfeatures’ database with MySQL [34]. EST evidence may be excluded from the main ‘genebuild’ for a species as it can often be contaminated with genomic DNA, be fragmented or contain sequencing errors. However, EST evidence is used if it is of high quality, for some species, especially if there is not extensive protein/mRNA information in public databases for that species.

A ‘genebuild’ is performed every time a new assembly becomes available or once in a year. However, new gene and mRNA information in public databases such as RefSeq and UniProt, and annotation updates such as oligos and probes can be included in Ensembl with every new release (i.e. every 2 months). These updates can be extracted from the Ensembl database and are shown in the GeneView page or ContigView, where tracks such as Ensembl genes and EMBL mRNAs can be drawn alongside the contig (or chromosome) (Figure 3). ‘Live’ information in external databases can be visualized in GeneView and ContigView using DAS (the distributed annotation system: see Modules 1 and 2). This information can be more current, depending on the database.


Figure 3
View larger version (40K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Figure 3: The ‘Detailed View’ panel of ContigView is shown. Human chromosome 6 is shown in the ‘Detailed view’ panel of ContigView for the region containing the myosin VI gene. Five transcripts are shown: two from the Ensembl annotation pipeline (labelled 3 and 4) and three from the VEGA/Havana manual curation group (labelled 1, 2 and 5 in this figure). Human cDNAs, EMBL mRNAs across species, and human proteins that align to this genomic region are drawn as tracks [selected in the ‘Features’ menu (circled)]. Clicking on a track identifies the protein or mRNA in a pop-up window. (In this example, an EMBL mRNA track has been selected, showing AY691329, a myosin VI mRNA from Danio rerio.)

 

    ORTHOLOGY/PARALOGY PREDICTION
 TOP
 ABSTRACT
 OVERVIEW
 INTRODUCTION TO THE ENSEMBL...
 THE ENSEMBL GENE SET
 ORTHOLOGY/PARALOGY PREDICTION
 DATA ACCESS
 THE BROWSER: OVERVIEW AND...
 SUMMARY
 FUTURE GOALS
 References
 
Orthologue and paralogue prediction is now carried out through the construction of phylogenetic trees across multiple species. The gene tree constructions use the longest translation for a gene and can be viewed in the GeneTreeView page (Figure 4), accessible by a link from the ‘GeneView’ page for any gene. NJTREE http://treesoft.sourceforge.net/njtree.shtml is used to generate trees from the longest translation of an Ensembl gene and to compare these against the species tree to determine speciation or duplication events. Nodes represent either duplication (red squares) or speciation events (blue squares), and in this way both recent and ancient paralogues can be detected. In addition to providing evolutionary information, gene predictions can be checked across species as a method of quality control. Alignments can be exported using the ‘Export’ roll-down menu on this page. Furthermore, alignments may be viewed and exported using the JalView [35] option on this page.


Figure 4
View larger version (27K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Figure 4: The GeneTreeView page above shows the gene tree for human myosin VI (MYO6) (circled). Gene trees are constructed using the longest translation of each gene across species. Orthologues and paralogues predicted from the trees can be seen on the GeneView page or they can be exported using BioMart. Alignments can be exported from the GeneTreeView page or viewed and exported using the JalView plug-in.

 
Multiple alignments are determined using Mercator http://www.biostat.wisc.edu/~cdewey/mercator/to build a synteny map and then Pecan http://www.ebi.ac.uk/~bjp/pecan/(B.Paten, manuscript in preparation) to perform the alignments. These alignments can be seen in the browser AlignSliceView, ContigView and CytoView pages. All alignments can be exported using BioMart, and can be found in the ftp site ftp://ftp.ensembl.org/pub/ as emf files (see ‘current_multi_species’ datafiles). Pairwise alignments are also available, determined with BlastZ-net [36] analysis for closely related species (alignment on the nucleic acid level) and Translated-Blat [37] for more distant species (alignment on the protein level). From these alignments, syntenic blocks are determined with a cutoff of (currently) 100 kb. Syntenic blocks can be viewed in SyntenyView and CytoView.


    DATA ACCESS
 TOP
 ABSTRACT
 OVERVIEW
 INTRODUCTION TO THE ENSEMBL...
 THE ENSEMBL GENE SET
 ORTHOLOGY/PARALOGY PREDICTION
 DATA ACCESS
 THE BROWSER: OVERVIEW AND...
 SUMMARY
 FUTURE GOALS
 References
 
To access Ensembl genes and associated annotation, three main windows to these data are provided to the public: the Ensembl browser (http://www.ensembl.org) (Figure 5), BioMart (a data-mining tool that extracts information from the Ensembl databases) and the Perl API. These three windows to Ensembl data are updated with every release. Furthermore, the core, variation, EST and comparative genomics databases are accessible through a public MySQL server. For access to the database (ensembldb.ensembl.org) specify ‘user’ as ‘anonymous’ (no password required). The API requires basic knowledge of Perl, however BioMart does not require any programming knowledge.


Figure 5
View larger version (19K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Figure 5: The top of the main page at www.ensembl.org. The release date and version are shown on the upper left-hand corner of the Ensembl homepage (circled). Species with annotated genomes are listed on the right-hand side and a tab to the Pre! site is shown for newer assemblies. A ‘smart’ search is provided at the top.

 
The Ensembl browser not only provides wetlab researchers access to the gene set and annotation without the need to directly query the Ensembl database, it displays genes and other features so that they can be directly understood and compared in the context of a chromosomal region. Furthermore, sequences, alignments and genomic features such as clone sets can be directly exported from the browser and BioMart. The BLAST [38] tool allows any sequence to be compared against any genomic assembly in Ensembl. BLAST searches can be carried out using the Ensembl browser WU-BLAST [39]. Alternatively, sequences can be compared against a cDNA or peptide set. BLASTN, BLASTP, BLASTX TBLASTN and TBLASTX are all supported options in the Ensembl browser. An option for very fast alignment to nucleic acid sequence is included (SSAHA2 [40]). An example of how SSAHA can be used within Ensembl is described in the following reference [41]. Finally, the browser provides a window into comparative genomics, where one can view homology prediction, gene trees, alignments and protein family predictions.

The main page of Ensembl (Figure 5) lists all the species in the browser for which there is annotation and provides links to BLAST, BioMart and the API. The assembly used in the genebuild for each species is shown by the picture or name, and clicking on a species link brings the user to an index page for more information about that species including statistics on the genebuild and species-specific news for the current release. News for each release is also shown here, browsable backwards into the archive sites. The navigation column on the left provides navigation of the website. Finally, a ‘smart search’ is provided at the top, and help pages and contact information can be accessed by clicking the blue button at the top right-hand corner of the page.

The browser is customizable. Optional user logins allow specific pages such as ContigView to be configured, not only as cookies but within stored preferences that can be accessed from any computer. Pages can also be bookmarked and notes can be attached within the browser. Furthermore, local data can be uploaded and displayed on an individual site. These customized pages can be shared using the ‘group’ function of the logins.

Links to pages in the browser can be found in the left-hand navigation column on every page, and within the page itself. From the GeneView and ContigView pages, the two major ‘views’ of the browser, virtually all of the other pages can be reached. Use the ‘Gene information’ link at the left to reach the GeneView page, or click on an Ensembl Gene ID link within a page. To reach the ContigView page, click on ‘Graphical view’ on the left, or follow links to a chromosomal region. More specific pages will be described in the modules, along with the links to reach these pages.Links to BioMart are available from every browser page. BioMart (Figure 6) is a ‘data-mining’ tool developed to quickly and effectively obtain datasets from the Ensembl databases as to the user's query. Tables can be exported in various formats (HTML, text, Microsoft Excel) and sequences can be obtained in FASTA format using this program. Module 5 discusses the uses of BioMart and provides some examples.


Figure 6
View larger version (17K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Figure 6: The BioMart interface is found by clicking on ‘BioMart’ from an Ensembl page. BioMart is a ‘data-mining’ tool that allows direct access of genes, sequences and annotation in the Ensembl databases without the necessity for programming knowledge. FASTA sequences and tables can be exported (in text, HTML, Excel or GFF format). Individual queries determine the specific gene sets and attached information. In addition to Ensembl genes for any species (the core database), the homology, alignment or variation databases can be selected.

 

    THE BROWSER: OVERVIEW AND PRACTICAL MODULES
 TOP
 ABSTRACT
 OVERVIEW
 INTRODUCTION TO THE ENSEMBL...
 THE ENSEMBL GENE SET
 ORTHOLOGY/PARALOGY PREDICTION
 DATA ACCESS
 THE BROWSER: OVERVIEW AND...
 SUMMARY
 FUTURE GOALS
 References
 
The browser is extensive in what it attempts to show and make accessible. To keep current with protein and mRNA information in the scientific databases, along with new genomic sequence information, a new Ensembl release occurs every 2 months incorporating new data, such as cDNA mapping updates and SNP information, along with new assemblies and gene sets. Archive sites (Figure 7) are available extending back in time for at least 2 years, depending on the species. A summary of archive sites and assemblies used is found at http://archive.ensembl.org/assembly.html. In contrast to the Archive sites, Ensembl also provides ‘Pre!’ sites that contain new assembly information that is not yet fully annotated (in version 43 there are 6 species in the Pre! Site) (Figure 1). These sites allow visualization and extraction of the newest sequence assembly information as soon as it is available.


Figure 7
View larger version (36K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Figure 7: The Ensembl homepage in the archive site (February 2006). Archive sites are available for every page in the browser extending back to October 2004. They are accessible from the link at the left-hand side of Ensembl pages. News can be browsed through all Ensembl releases and a table of assemblies used in previous releases across species is available at the following URL: http://archive.ensembl.org/assembly.html.

 
The first module focuses on one gene or transcript (GeneView, TransView) and demonstrates how the supporting evidence behind a gene prediction can be viewed (ExonView). Module 1 also provides an introduction to viewing external sources with DAS (the distributed annotation system) [42]. Module 2 describes how to view a chromosomal region and annotation for a section of the genome (ContigView, CytoView). Module 3 discusses variations (SNPView, GeneSNPView, TranscriptSNPView), module 4 demonstrates comparative genomics options (GeneTreeView, AIignSliceView, GeneSeqAlignView and SyntenyView) and module 5 provides an overview of BioMart.

Practical module 1: view information for a gene (The GeneView, TransView and ExonView pages)
To search for a gene, type in a name or ID and optionally the species of interest along with ‘gene’ in the search box at the top of the home page and click ‘Go’. (Figure 8) (For example, ‘human HFE gene’). Click on the Ensembl identifier (in this example, ENSG00000010704) in the search results to go to the ‘GeneView’ page. (A link to ContigView is also provided in the header of the search result). Note that there is also a Vega gene annotated. Pages in the Ensembl browser are termed ‘views’. ‘GeneView’ (also reachable through the ‘Gene information’ link at the left of Ensembl pages) provides gene-specific information such as gene structure, number of transcripts, position on the chromosome, homology information, links to DAS (see below), identifiers in other databases (Figure 9), and protein domain predictions.


Figure 8
View larger version (5K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Figure 8: The search box. To search for the HFE (hereditary haemochromatosis protein precursor) gene in human, use the ‘smart search’ box at the top of the main page as shown in the picture. Alternatively, smaller search boxes can be found in the upper right-hand corner of other Ensembl pages.

 

Figure 9
View larger version (34K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Figure 9: ‘Similarity Matches’ for the human HFE gene. Once an Ensembl transcript is constructed, it is compared against other databases to find similar sequences. Here, IDs that match an Ensembl HFE transcript are shown in NCBI (RefSeq), UniProt and EntrezGene along with Agilent Probes. The ‘Target%ID’ is the percentage of the target sequence belonging to the external ID that matches to the Ensembl transcript. The ‘Query%ID’ is the percent of the Ensembl transcript that matches to the external sequence. This transcript is a member of the CCDS (Consensus CoDing Sequence) set (see text).

 
The distributed annotation system (DAS [42]) provides a means of allowing Ensembl to expand beyond its own databases by including information from external sources not housed in Ensembl. With DAS, information in databases worldwide can be viewed for an Ensembl gene, such as publications in PubMed (select the option ‘HUGO_text’ in the GeneView page, for human) (Figure 10A), or for a position on the assembly (in ContigView). To display this information, select one or more DAS options in the GeneView page and click ‘Update’. As DAS sources are not housed in Ensembl, but remain external, any recent modifications to those databases will be accessed by the Ensembl browser. DAS is a way of ‘reaching out’ beyond Ensembl's databases to expand the annotation available for a gene or chromosomal region, and to provide the newest information in those databases.


Figure 10
Figure 10
View larger version (72K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Figure 10: DAS reports in GeneView and ContigView are selected. (A) DAS options in GeneView. PubMed entries are shown for the human myosin VI gene. These are current entries in the PubMed databases. DAS sources can be selected on this page and displayed using the ‘Update’ button. ‘Manage sources’ allows the users to upload their own information for display. (B) GIS PETs and DECIPHER elements were selected from the ‘DAS Sources’ menu and displayed for the myosin VI locus in human (chromosome 6, base pairs 76, 515 646–76 683 049) in ContigView (see text for references). GIS PETs and CHiP PETs are sequenced paired-end ditags from the Genome Institute of Singapore, and DECIPHER is a database of chromosomal shifts and imbalances linked to clinical information (DatabasE of Chromosomal Imbalance and Phenotype in Human using Ensembl Resources). These selected ‘DAS’ tracks reflect current information in external (to Ensembl) databases. Clicking on a track yields information about the mapped element, and a link to a description of the source.

 
From this page, links are available for the gene tree, ID history (to track Ensembl IDs in previous releases) and chromosomal position among others. To navigate through the pages (views) of Ensembl, use the yellow navigation column at the left of the page. Click on the link ‘Transcript information’ to go to the TransView page for this gene.

TransView contains much of the same information as the GeneView page, however it is focused on only one transcript. A list of ‘similarity matches’ is shown (matches to gene IDs in other databases such as UniProt and Entrez Gene [43], phenotype IDs in MIM (Mendelian Inheritance in Man [44]) (Figure 9) and IDs for probes from Agilent, Illumina, etc. The base-pair sequence for the spliced transcript (exons only) is shown here. Protein sequence and SNPs can be drawn along the base-pair sequence.

The ExonView page shows intronic and flanking sequence as well as exons, and includes the supporting evidence for an Ensembl gene prediction. Click on ‘exon information’ at the left to reach the ‘ExonView’ page. This page is colour-coded to differentiate coding sequence (black), UTRs (untranslated regions, purple), intronic (blue) and flanking sequence (green). The display can be configured to show full or partial intronic sequence along with a variable flanking sequence length. At the bottom of the page is the ‘Supporting Evidence’ panel (Figure 11). This panel shows all mRNA and protein entries in public databases (UniProt/SwissProt, UniProt/TrEMBL and RefSeq) that were used to make an Ensembl transcript prediction.

Module 2: view a region of the chromosome
Gene information can also be viewed for a region of the chromosome, rather than starting with one specific ID. The Ensembl search function allows a chromosomal region to be directly accessed. For example, search for ‘mouse chromosome 2:152700000.152800000’ to visualize this base-pair range on ContigView, which displays a specific region of chromosome 2. SNP information can be displayed along the chromosome in this page using the ‘features’ roll-down menu in the ‘Detailed view’ panel of this page. One can also display Ensembl genes, genes in other databases, repeats and comparative genomics information, and other annotation using the roll-down menus in this display.

DAS tracks can also be viewed in the ContigView page. In Figure 10B, ‘DECIPHER’ elements (http://decipher.sanger.ac.uk/) and ‘GIS PETs’ [45, 46] are shown alongside Ensembl transcripts. (See the figure legend for more information about these DAS sources.) Finally, users can display their own information on Ensembl pages: GeneView, ProtView, ContigView and CytoView. (See the ‘Manage sources’ link under ‘DAS’ in these pages).

DAS sources are available across species. For example, for mouse, Fantom CAGE tags (Short ‘Cap-Analysis Gene Expression’ sequences from the ‘Functional Annotation of the mouse’ consortium at RIKEN) [47] and MICER clones (a library of targeting vectors for gene silencing and chromosome rearrangements in mouse) [48] can be viewed along the assembly in ContigView. The ‘Detailed view’ panel is highly customizable and provides a template upon which to display features along a chromosome or contig (Figure 12).


Figure 12
View larger version (50K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Figure 12: The ‘Detailed View’ panel of ContigView. The ‘Detailed view’ panel of Contigview for a region on mouse chromosome 2. Ensembl transcripts, mRNAs in the EMBL database, MICER clones and Fantom CAGE tags (the latter two using DAS—see text for references) are displayed along the contig (AL928862.13). A zoom bar or ladder is displayed at the top for ease of navigation. Features drawn above the contig are on the positive or forward strand of the chromosome, and those drawn under the contig are found on the reverse strand. (Note: this is not the case for the clones, which are all drawn under the contig).

 
This page can be obtained by clicking on ‘Graphical view’ from most Ensembl pages. The ‘Graphical overview’ link directly under the ‘Graphical view’ link in the left-hand navigation column leads to ‘CytoView’ in which a large region of the chromosome can be viewed (up to 50 Mb, in comparison to a maximum display of 1 Mb for ContigView. However, fewer annotation options are available in the CytoView display.) In CytoView, clone sets cannot only be viewed along the chromosomal region as in ContigView, they can be exported by chromosome, a specified genomic region, or the whole genome. [Clone sets can also be exported with BioMart, under the ‘Genomic Features’ option in the Attributes (Features) page].

Module 3: variations
Most variation information in Ensembl is imported from dbSNP, though some are imported from resequencing projects such as the STAR project for rat. The imported variations (SNPs at single base-pair locations and in-dels) include flanking sequence, and are matched against the genome and stored in the database along with SNP type in the context of an Ensembl transcript (for example, coding, noncoding, intronic), allele and any ensuing peptide shift. SNPView and GeneSNPView portray variation information in depth, and SNPs can also be drawn on TransView, ProtView and ContigView. To reach SNPView, click on a SNP drawn in the TransView, ProtView or ContigView page. In the first two of these pages, SNPs can be drawn along the sequence (see the customization choices under the sequence). In ContigView (obtained through the ‘graphical view’ link), SNPs may be turned on using the ‘Features’ menu of the ‘Detailed view’ panel. In addition, BioMart can be used to access this variation information, either using the Ensembl core database or a SNP database (dbSNP and others). Finally, SNPs across strains can be displayed for mouse and rat (across breeds for dogs, and across individuals for human) with TranscriptSNPView to view similarities and differences in SNPs in multiple strains. For example, SNPs in BALB/cByJ can be compared with those in 129X1/SvJ and the reference strain (C57BL/6J). (To find this page, click on ‘compare SNPs for transcript’ in the left-hand navigation column). SNPs for different strains can be exported using BioMart. (Figure 13) An in-depth discussion of variation resources will be presented (Chen et al., manuscript in preparation).


Figure 13
View larger version (41K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Figure 13: Strain variations are shown in TranscriptSNPView. TranscriptSNPView allows variation information to be directly compared across multiple strains of mouse. Strains can be selected in the roll-down menu at the top of the panel. This is the transcript SNP report for mouse Xkr7, a transmembrane protein. Access this page using the ‘Compare transcript SNPs’ link in the navigation column at the left of an Ensembl page (shown above the panel).

 
Module 4: comparative genomics
Homology predictions and alignments can be viewed and exported using either the browser or BioMart. Homologues (orthologues and paralogues) are predicted from gene trees and visible on the GeneView page. Click on any of the homologues for an alignment to the gene of interest. To view the gene tree for a gene, click on ‘Gene tree info’ at the left of the page to go to GeneTreeView (Figure 7). Multiple alignments are calculated across eutherian mammals (seven species are used: chimpanzee, cow, dog, human, macaque, mouse and rat) and amniotic vertebrates (10 species are used: cow, chicken, chimpanzee, dog, human, macaque, mouse, opossum, platypus and rat). These alignments are calculated using whole genomes and can be graphically displayed on AlignSliceView (use the ‘View alignment with ...’ link from the ContigView page). Export the aligned sequences on the nucleotide level from this view, or view and/or export the sequences using GeneSeqAlignView (the ‘Genomic sequence alignment’ link from the GeneView page). Alignments can also be exported through BioMart. Pairwise alignments are available using these same pages.

Syntenic regions are determined using these alignments. Syntenic blocks for a chromosome with chromosomes of another species are displayed in Synteny View. Syntenic blocks can also be displayed in CytoView. MultiContigView allows two chromosomes from different species to be compared with each other. Conserved regions can be highlighted.

Module 5: BioMart
BioMart provides a tool to allow fast export of customized tables and sequences in a format useful to scientists [Microsoft Excel, text, HTML or FASTA (for sequences) format]. Annotation in Ensembl such as gene IDs, SNPs, GO terms and protein domains can all be obtained with this program. Sequences (such as: genomic, transcript, cDNA, protein and flanking sequences) can be exported in FASTA format with the option to customize the header.

BioMart extracts data from the Ensembl databases according to the user's specifications, and is upgraded along with Ensembl with every release. There are three main phases of the web interface: (i) the database is selected, for example the Ensembl gene set for a species can be selected in ‘Dataset’, (ii) the ‘Attributes’ are chosen, allowing associated annotation to be attached to this gene set (such as position on the chromosome, associated SNPs, homologous genes in other species, IDs in other databases, etc.) and (iii) ‘Filters’ can be selected, allowing a subset of the gene set to be chosen (if desired) (Figure 14).


Figure 14
View larger version (44K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Figure 14: A BioMart output table is displayed. An example of a table constructed with BioMart. Columns are selected in the ‘Attributes’ section. Only rat genes on chromosome 3 were selected for (in ‘Filters’). The table can be exported as text, HTML or in Microsoft Excel format.

 
As a practical introduction to the BioMart set-up, the following explanation is given. The three phases of BioMart are marked A, B and C in Figure 15. Attributes and filters selected appear in the left-hand ‘summary’ sections of BioMart, and the output will reflect these choices unless deselected. The attributes determine the headings for columns in the BioMart output table, and the order in which they were selected determines the order of the columns.


Figure 15
View larger version (49K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Figure 15: The three main phases of BioMart are shown, at the left of the interface. ‘Dataset’ is marked as ‘A’, and allows selection of the database and species. ‘Attributes’ (B) allows selection of column headings for the output table. ‘Filters’ (C) are optional, and allow selection of a subset of genes from the Ensembl database. Clicking A, B or C changes options in the right-hand side of the BioMart page, and allow for specific genes and annotation to be chosen.

 
The ‘attributes’ can be thought of as what one would like to know about the selected gene set (For example, the chromosomal positions of a set of genes, associated GO terms, variations or sequences). The optional third phase [the ‘Filter’ phase (Figure 15: C)] allows the gene set to be narrowed, if information is not wanted for the entire gene set for a species. These ‘filters’ are specified by what the scientist already knows about his/her gene set. Chromosomal location, gene IDs and InterPro domains [49] are among the options that can be used to select a smaller gene set.

One example of a BioMart query is as follows: enter Ensembl gene IDs (for example, the HFE gene: ENSG00000010704) and obtain a corresponding list of official HGNC IDs determined by the HUGO Gene Nomenclature Committee [50] (for human), MGI [51] (for mouse), Entrez Gene and UniProt. This would be performed by specifying the Ensembl gene ID as a filter (click ‘Filters’ at the left of the BioMart window, enter ‘ENSG00000010704’ under ‘ID list limit’ in the ‘GENE’ section in the right-hand side of the window), and selecting HGNC (for human), Entrez Gene and UniProt IDs as attributes. Note that a row for each transcript is given in the output table.

A second example is to find all genes on chromosome X that are associated with a specific InterPro domain, for example the immunoglobulin-like domain (IPR013151). In this case, the InterPro domain and X chromosome are specified under ‘Filters’, and gene IDs are selected in the Attributes section.

Advanced queries can be carried out using a linked-in secondary dataset. More information about how to use BioMart is available at http://www.biomart.org, also short tutorials are available in video format in the Ensembl ‘Helpdesk’ section: Workshops Online. BioMart is also available in the archive sites.


    SUMMARY
 TOP
 ABSTRACT
 OVERVIEW
 INTRODUCTION TO THE ENSEMBL...
 THE ENSEMBL GENE SET
 ORTHOLOGY/PARALOGY PREDICTION
 DATA ACCESS
 THE BROWSER: OVERVIEW AND...
 SUMMARY
 FUTURE GOALS
 References
 
Ensembl endeavours to provide a comprehensive, highly accurate and current representation of the genome for a variety of species (focusing on the vertebrates). The browser organizes and depicts (with graphs, diagrams and tables) a vast multitude of gene and sequence-associated information for the scientific community. Ensembl addresses the genome through a variety of browser pages (or Ensembl ‘Views’), and through databases publicly accessible by the Perl API (Application Programming Interface: a series of algorithms that allow extraction of specific data from a database) or through BioMart. In addition, genomic annotation is available for download via the ftp site. New releases are provided to keep current with the most recent entries and updates in scientific databases.

The Perl API is kept current with every release. Documentation is available, along with instruction and a tutorial. The Perl API accesses all Ensembl databases [core, variation, ‘otherfeatures’ (containing the EST genes) and ‘compara’, the comparative genomics database]. Support and discussion are offered in the form of the ensembl-dev list (subscription instructions are here: http://www.ensembl.org/info/about/contact.html), also an ensembl-announce mailing list keeps users up-to-date on coming developments.

Technical support is also offered in the form of a ‘Helpdesk’. Scientists and programmers are encouraged to email questions or comments on any level to helpdesk{at}ensembl.org. Furthermore, detailed help pages are provided for Ensembl ‘views’. Clicking on the blue ‘Help’ button in the upper right-hand corner of any page returns page-specific information. Short instructional videos and slide presentations are available on the website, along with a worked example and glossary. Finally, Ensembl provides free workshops to instruct beginners and intermediate users in the website and/or the API.


    FUTURE GOALS
 TOP
 ABSTRACT
 OVERVIEW
 INTRODUCTION TO THE ENSEMBL...
 THE ENSEMBL GENE SET
 ORTHOLOGY/PARALOGY PREDICTION
 DATA ACCESS
 THE BROWSER: OVERVIEW AND...
 SUMMARY
 FUTURE GOALS
 References
 
As sequence assemblies improve and mRNA and protein entries in databases become more comprehensive, the Ensembl gene set is updated to reflect this new information. The annotation pipeline is continuously compared against gene sets developed by manual annotators, and is improved in order to make the most accurate, biologically relevant gene predictions. With regards to the increasing amounts of gene annotation in scientific databases, Ensembl aims to both contain this wealth of information and organize the site so that complexity is minimized. Frequent releases and DAS allow Ensembl to incorporate the newest information about its genes, and Ensembl aims to maintain that despite the growing number of assemblies, reflecting an increasing number of organisms with sequenced genomes. Finally, Ensembl strives to reach outside the genome and trace a gene from its simple sequence out to cellular function, connecting it with the world of proteins within an organism, and making this information readily accessible to the scientific public.


Key Points

  • Ensembl develops its gene set based on experimental evidence for over 30 species.
  • The browser organizes and depicts genes and annotation for the scientific public to understand a gene in the context of the genome.
  • Ensembl uses comparative studies to understand relationships and predict homologies between genes across species.
  • The databases are open source and can be accessed using the MySQL language or the Perl API (Application Programming Interface) maintained and developed at Ensembl.
  • BioMart is included with every Ensembl release for advanced data extraction of genes, annotation and sequences from the Ensembl database without the need for programming.

 


    FOOTNOTES
 
Giulietta Spudich has a background in biochemical research in the US and UK, and joined the Ensembl Outreach and Training team in July 2006. She now develops educational materials for Ensembl and gives workshops worldwide.

Xosé M. Fernández-Suárez has a background in molecular biology. He is the Project Leader for Ensembl Outreach and Training. He has been involved at multiple stages of the Ensembl development and has given courses and talks on Ensembl worldwide.

Ewan Birney, a Senior Scientist at EMBL, is currently the Head of Nucleotide Data at the EBI and leads the EBI half of the Ensembl and Reactome projects. He has been involved in the analysis of nearly every metazoan genome sequence, both via his leadership of Ensembl and contribution of his own research, and is a leading member of the human genome community, directly involved in both the draft and finished analysis of the human genome. In 2003, Ewan was awarded the inaugural Francis Crick Prize from the Royal Society, presented to an outstanding young molecular biologist.


    References
 TOP
 ABSTRACT
 OVERVIEW
 INTRODUCTION TO THE ENSEMBL...
 THE ENSEMBL GENE SET
 ORTHOLOGY/PARALOGY PREDICTION
 DATA ACCESS
 THE BROWSER: OVERVIEW AND...
 SUMMARY
 FUTURE GOALS
 References
 

  1. Hubbard T. Biological information: making it accessible and integrated (and trying to make sense of it). Bioinformatics (2002) 18(Suppl 2):S140.[Abstract]
  2. Birney E, Andrews TD, Bevan P, et al. An overview of ensembl. Genome Res (2004) 14:925–8.[Abstract/Free Full Text]
  3. Birney E, Andrews D, Caccamo M, et al. Ensembl 2006. Nucleic Acids Res (2006) 34(Database issue):D556–61.[Abstract/Free Full Text]
  4. Hubbard TJ, Aken BL, Beal K, et al. Ensembl 2007. Nucleic Acids Res (2007) 35(Database issue):D610–7.[Abstract/Free Full Text]
  5. Venter JC, Adams MD, Myers EW, et al. The sequence of the human genome. Science (2001) 291:1304–51.[Abstract/Free Full Text]
  6. Lander ES, Linton LM, Birren B, et al. Initial sequencing and analysis of the human genome. Nature (2001) 409:860–921.[CrossRef][Medline]
  7. Waterston RH, Lindblad-Toh K, et al, Mouse Genome Sequencing Consortium. Initial sequencing and comparative analysis of the mouse genome. Nature (2002) 420:520–62.[CrossRef][Medline]
  8. Gibbs RA, Weinstock GM, Metzker ML, et al. Genome sequence of the brown Norway rat yields insights into mammalian evolution. Nature (2004) 428:493–521.[CrossRef][Medline]
  9. Sprague J, Bayraktaroglu L, Clements D, et al. The zebrafish information network: The zebrafish model organism database. Nucleic Acids Res (2006) 34(Database issue):D581–5.[Abstract/Free Full Text]
  10. Goffeau A, Barrell BG, Bussey H, et al. Life with 6000 genes. Science (1996) 274:546, 563–7.
  11. C. elegans Sequencing Consortium. Genome sequence of the nematode C. elegans: a platform for investigating biology. Science (1998) 282:2012–8.[Abstract/Free Full Text]
  12. Myers EW, Sutton GG, Delcher AL, et al. A whole-genome assembly of drosophila. Science (2000) 287:2196–204.[Abstract/Free Full Text]
  13. Bairoch A, Apweiler R, Wu CH, et al. The universal protein resource (UniProt). Nucleic Acids Res (2005) 33(Database issue):D154–9.[Abstract/Free Full Text]
  14. Pruitt KD, Tatusova T, Maglott DR. NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res (2007) 35(Database issue):D61–5.[Abstract/Free Full Text]
  15. Harris MA, Clark J, Ireland A, et al. The gene ontology (GO) database and informatics resource. Nucleic Acids Res (2004) 32(Database issue):D258–61.[Abstract/Free Full Text]
  16. Su AI, Cooke MP, Ching KA, et al. Large-scale analysis of the human and mouse transcriptomes. Proc Natl Acad Sci USA (2002) 99:4465–70.[Abstract/Free Full Text]
  17. Kelso J, Visagie J, Theiler G, et al. eVOC: a controlled vocabulary for unifying gene expression data. Genome Res (2003) 13:1222–30.[Abstract/Free Full Text]
  18. Blake JA, Richardson JE, Bult CJ, et al. MGD: the mouse genome database. Nucleic Acids Res (2003) 31:193–5.[Abstract/Free Full Text]
  19. Fodor SP, Rava RP, Huang XC, et al. Multiplexed biochemical assays with biological chips. Nature (1993) 364:555–6.[CrossRef][Medline]
  20. Steemers FJ, Gunderson KL. Illumina, inc. Pharmacogenomics (2005) 6:777–82.[CrossRef][Web of Science][Medline]
  21. Cherry JM, Adler C, Ball C, et al. SGD: Saccharomyces genome database. Nucleic Acids Res (1998) 26:73–9.[Abstract/Free Full Text]
  22. Chen N, Harris TW, Antoshechkin I, et al. WormBase: a comprehensive data resource for caenorhabditis biology and genomics. Nucleic Acids Res (2005) 33(Database issue):D383–9.[Abstract/Free Full Text]
  23. Ashurst JL, Chen CK, Gilbert JG, et al. The vertebrate genome annotation (vega) database. Nucleic Acids Res (2005) 33(Database issue):D459–65.[Abstract/Free Full Text]
  24. Curwen V, Eyras E, Andrews TD, et al. The ensembl automatic gene annotation system. Genome Res (2004) 14:942–50.[Abstract/Free Full Text]
  25. Potter SC, Clarke L, Curwen V, et al. The ensembl analysis pipeline. Genome Res (2004) 14:934–41.[Abstract/Free Full Text]
  26. Fernández-Suárez XM, Searle S, Birney E. Ensembl's annotation pipeline and its use in eukaryotic genomes. In. In: Anonymous In Silico Genomics and Proteomics: Functional Annotation of Genomes and Proteins (2006) New York: Nova Science Publishers, Inc. 109–23. Mulder, N. and Apweiler, R.
  27. Birney E, Clamp M, Durbin R. GeneWise and genomewise. Genome Res (2004) 14:988–95.[Abstract/Free Full Text]
  28. Slater GS, Birney E. Automated generation of heuristics for biological sequence comparison. BMC Bioinformatics (2005) 6:31.[CrossRef][Medline]
  29. Drysdale RA, Crosby MA, FlyBase Consortium. FlyBase: genes and gene models. Nucleic Acids Res (2005) 33(Database issue):D390–5.[Abstract/Free Full Text]
  30. Brunet FG, Crollius HR, Paris M, et al. Gene loss and evolutionary rates following whole-genome duplication in teleost fishes. Mol Biol Evol (2006) 23:1808–16.[Abstract/Free Full Text]
  31. Jaillon O, Aury JM, Brunet F, et al. Genome duplication in the teleost fish tetraodon nigroviridis reveals the early vertebrate proto-karyotype. Nature (2004) 431:946–57.[CrossRef][Medline]
  32. Eyras E, Reymond A, Castelo R, et al. Gene finding in the chicken genome. BMC Bioinformatics (2005) 6:131.[CrossRef][Medline]
  33. Brett D, Hanke J, Lehmann G, et al. EST comparison indicates 38% of human mRNAs contain possible alternative splice forms. FEBS Lett (2005) 474:83–6.[CrossRef]
  34. Stabenau A, McVicker G, Melsopp C, et al. The ensembl core software libraries. Genome Res (2004) 14:929–33.[Abstract/Free Full Text]
  35. Clamp M, Cuff J, Searle SM, et al. The jalview java alignment editor. Bioinformatics (2004) 20:426–7.[Abstract/Free Full Text]
  36. Schwartz S, Kent WJ, Smit A, et al. Human-mouse alignments with BLASTZ. Genome Res (2003) 13:103–7.[Abstract/Free Full Text]
  37. Kent WJ. BLAT – the BLAST-like alignment tool. Genome Res (2002) 12:656–64.[Abstract/Free Full Text]
  38. Altschul SF, Gish W, Miller W, et al. Basic local alignment search tool. J Mol Biol (1990) 215:403–10.[CrossRef][Web of Science][Medline]
  39. Lopez R, Silventoinen V, Robinson S, et al. WU-Blast2 server at the European bioinformatics institute. Nucleic Acids Res (2003) 31:3795–8.[Abstract/Free Full Text]
  40. Ning Z, Cox AJ, Mullikin JC. SSAHA: a fast search method for large DNA databases. Genome Res (2001) 11:1725–9.[Abstract/Free Full Text]
  41. Fernández-Suárez XM, Schuster MK. Using the ensembl genome server to browse genomic sequence data. Current Protocols in Bioinformatics Supplement (2007) 16:1.15.1–1.15.36.
  42. Dowell RD, Jokerst RM, Day A, et al. The distributed annotation system. BMC Bioinformatics (2001) 2:7.[CrossRef][Medline]
  43. Maglott D, Ostell J, Pruitt KD, et al. Entrez gene: gene-centered information at NCBI. Nucleic Acids Res (2007) 35(Database issue):D26–31.[Abstract/Free Full Text]
  44. McKusick VA. Mendelian inheritance in man and its online version, OMIM. Am J Hum Genet (2007) 80:588–604.[CrossRef][Web of Science][Medline]
  45. Chiu KP, Wong CH, Chen Q, et al. PET-tool: a software suite for comprehensive processing and managing of paired-end diTag (PET) sequence data. BMC Bioinformatics (2006) 7:390.[CrossRef][Medline]
  46. Ng P, Wei CL, Sung WK, et al. Gene identification signature (GIS) analysis for transcriptome characterization and genome annotation. Nat Methods (2005) 2:105–11.[CrossRef][Web of Science][Medline]
  47. Shiraki T, Kondo S, Katayama S, et al. Cap analysis gene expression for high-throughput analysis of transcriptional starting point and identification of promoter usage. Proc Natl Acad Sci USA (2003) 100:15776–81.[Abstract/Free Full Text]
  48. Adams DJ, Biggs PJ, Cox T, et al. Mutagenic insertion and chromosome engineering resource (MICER). Nat Genet (2004) 36:867–71.[CrossRef][Web of Science][Medline]
  49. Mulder NJ, Apweiler R, Attwood TK, et al. InterPro, progress and status in 2005. Nucleic Acids Res (2005) 33(Database Issue):D201–5.[Abstract/Free Full Text]
  50. Eyre TA, Ducluzeau F, Sneddon TP, et al. The HUGO gene nomenclature database, 2006 updates. Nucleic Acids Res (2006) 34(Database issue):D319–21.[Abstract/Free Full Text]
  51. Eppig JT, Blake JA, Bult CJ, et al. The mouse genome database (MGD): new features facilitating a model system. Nucleic Acids Res (2007) 35(Database issue):D630–7.[Abstract/Free Full Text]

Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Circ Cardiovasc GenetHome page
T. Wang and T. S. Furey
Analysis of Complex Disease Association and Linkage Studies Using the University of California Santa Cruz Genome Browser
Circ Cardiovasc Genet, April 1, 2009; 2(2): 199 - 204.
[Full Text] [PDF]


Home page
BioinformaticsHome page
G. A. Reeves, K. Eilbeck, M. Magrane, C. O'Donovan, L. Montecchi-Palazzi, M. A. Harris, S. Orchard, R. C. Jimenez, A. Prlic, T. J. P. Hubbard, et al.
The Protein Feature Ontology: a tool for the unification of protein feature annotations
Bioinformatics, December 1, 2008; 24(23): 2767 - 2772.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow All Versions of this Article:
6/3/202    most recent
elm025v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Spudich, G.
Right arrow Articles by Birney, E.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Spudich, G.
Right arrow Articles by Birney, E.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?