Skip Navigation

Briefings in Functional Genomics and Proteomics 2006 5(1):32-36; doi:10.1093/bfgp/ell010
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Werner, T.
Right arrow Articles by Nelson, P. J.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Werner, T.
Right arrow Articles by Nelson, P. J.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© Oxford University Press, 2006, All rights reserved. For permissions, please email: journals.permissions@oxfordjournals.org

Special Issues Papers

Joining high-throughput technology with in silico modelling advances genome-wide screening towards targeted discovery

Thomas Werner and Peter J. Nelson

Corresponding author. Thomas Werner, Genomatix Software GmbH Bayerstrasse 85a, 80335 München Germany. Tel: +49 89 599766 0; Fax: +49 89 599766 55. E-mail: Werner{at}genomatix.de


    ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 CONCLUSIONS
 Acknowledgements
 References
 
Genome research has entered the functional evaluation phase now and high-throughput (HT) methods provide an enormous amount of raw data for that purpose. However, functional verification still requires experimental regimens not suitable for application in a HT style, requiring an efficient discovery and selection process pinpointing biological mechanisms and processes for subsequent targeted verification. Regulatory networks and underlying molecular mechanisms can now be deduced through the interpretation of HT-data in the context of biologic knowledge. Computational models of promoter structures are suitable for genome-wide searches and a number of recent examples demonstrate their usefulness in prediction and selection of functional targets for experimental verification.

Keywords: microarray analysis, regulatory networks, promoter analysis, disease networks, promoter model


    INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 CONCLUSIONS
 Acknowledgements
 References
 
Since the completion of the first draft sequence of the human genome was announced in 2001 [1], high-throughput (HT) method applications have gained a new momentum as it was assumed that the interpretation of the data would be straightforward. The DNA microarrays for transcription analysis [2] and protein–protein interaction methods such as mass spectrometry of protein complexes [3] or genomic protein–protein interaction assays (such as yeast or mammalian two-hybrid systems) have become very popular in the quest for functional analysis of genomes.

However, it is now clear that the application of HT technologies produces relatively ‘flat’ data, a more cataloguing biological events and interactions than providing an understanding. Microarray data, for example, provides only a general summary of what is going on in terms of transcript levels without directly revealing any mechanisms underlying the observed changes (or detailing different simultaneously occurring events). Protein interaction data usually result in long lists of dual connections, again not revealing the reasons for complex formation other than physical affinities. Nevertheless, recent application of various HT approaches have provided an unprecedented amount of data, e.g. aiming at genome-wide coverage of basic protein interactions [4, 5] (Figure 1 top).


Figure 1
View larger version (28K):
[in this window]
[in a new window]
 
Figure 1: Linking of HT results to the relevant biological context. The linear lists of expression changes and/or pairwise protein–protein interaction data can be in some cases linked to the functional biological context as indicated by the dotted arrows. In many cases such links cannot be obtained or only fragmentary links can be revealed. Using the HT-data to elucidate the underlying promoter networks as exemplified by transcription-factor binding-site frameworks (indicated by the vertical boxes on top of the promoter boxes) often allows much better elucidation of the functional biological context. This principle is illustrated on three successful examples in this review. Y2H = yeast two-hybrid analysis, MS = mass spectrometry. The shaded objects in the pathway/network and protein-complex circles indicate the respective proteins that interact functionally. Promoters are indicated by the horizontal boxes in the central promoter network circle, and transcription-factor binding sites are indicated by the vertical smaller boxes on top of the promoters.

 
The focus of research is now shifting from the collection of data towards the functional interpretation of processes aimed increasingly at the elucidation of functional gene networks and protein complexes. This has been best exemplified by the contributions presented at the recent meetings of the BITS conference (Hinxton Hall, 2005, Kazusa, Japan, 2004 and Hinxton Hall, 2003), which bears that intention already in its title: Beyond the Identification of Transcribed Sequences. The so-called post-genomics projects such as FANTOM2 [6] are geared towards using the large-scale data collections of the earlier projects especially to elucidate elaborate biological processes such as the cell cycle or the consequences of the multitude of alternatively spliced transcripts [7].

There are two consequences from the more functional focus of the current research. On the laboratory side, schemes and setups became more sophisticated and much harder to maintain on a large scale. Functional analysis very often requires methods more like the old ‘deep drilling’ approaches such as microdissection of tissues or laser-capture single cell assays, which are less suitable for broad screening. On the theoretical side, going beyond the flat connection structure, which is the typical result of HT approaches, requires more advanced methods than statistical analysis, however sophisticated it may become. There is a clear need to incorporate biological knowledge and principles into the methods in order to elucidate the biological processes underlying the HT results.

However, recent advances in theoretical analysis methods, termed in silico biology allowed dramatic progress in the biologically meaningful interpretation of HT-data [8–10]. While proteins are principal players in biology, their interactions are often direct including physical contact (proteomics) or clearly defined substrate interaction chains (metabolomics). Importantly, the links for all these individual pathways and processes on a genome-wide scale are usually encoded in regulatory networks dictating when and where the proteins are expressed. These include signal transduction pathways as well as transcriptional regulatory mechanisms at the enhancer and promoter level. Therefore, advances in elucidation of such regulatory networks are central to the understanding of diseases as well as normal physiological processes (Figure 1). The analysis of regulatory networks has required the development of completely new tools. In gene regulation, the functional organization of small elements takes precedence over sequence similarity [11]. With the availability of the mouse and the rat genomic sequences [12] comparative genomics has developed into a powerful in silico tool, as phylogenetic conservation of any feature is considered a token of functionality. Once mechanistic details of a particular process or disease have been revealed they can be used for very fast genome-wide scans [13]. The in-depth laboratory-based analysis is then used to verify or falsify the results of the in silico scans rather than to carry out the screening itself shifting the load of searching to the computational side.

We illustrate the successful combination of in silico and in vitro/in vivo studies on three recent examples, all related to diseases and regulatory networks underlying disease-relevant processes. Diabetes is a very complex case, which involves a multitude of genes and genetic traits. One form of diabetes, maturity onset of diabetes of the young (MODY) is an example where direct involvement of transcription factors, in particular hepatic nuclear factors (HNF family) has already been established [14]. Naturally, genome-wide scans for binding sites of such transcription factors are one way to locate additional candidate genes potentially involved in MODY. Unfortunately, such searches usually produce an enormous amount of matches not related to the process under investigation. Not all binding sites are also biologically functional and HNF factors fulfill other functions totally unrelated to MODY as well. One way to focus on disease relevant processes is to restrict the search to genes with known involvement and to take advantage of phylogenetic conservation of the findings between human and mouse as demonstrated by Lockwood et al. [15]. However, this approach does not allow for genome-wide searches, as the discriminative power of individual transcription-factor binding sites is insufficient. Doehr et al. [16] have demonstrated in an in silico study that comparative genomics can be applied to develop computational models of disease-related frameworks of multiple transcription factors. This approach allowed the elucidation of regulatory networks associated with MODY relevant processes even without direct involvement of HNF transcription factors by a genome-wide scan with the promoter models. This directly opened the way to targeted verification of the revealed connections, which can now proceed using classical functional assays.

Another example of a mechanistic in silico analysis of a disease is prostate cancer, which afflicts one in seven males during their lifetime [17]. An elegant experimental study utilizing laser microdissection and subsequent microarray screening of gene expression identified a number of genes involved in the transition of prostatic intraepithelial neoplasia to invasive prostate cancer [18]. However, such analysis does not reveal the mechanisms behind the observed changes, which requires a more intensive analysis of regulatory processes. This was carried out successfully in another recent study, where the authors used a set of only four known androgen-induced genes involved in prostate cancer, to derive a promoter framework consisting of a combination of androgen receptor with GATA binding sites, which they subsequently used to scan the whole human genome for additional promoter matches of the model [19]. In vivo binding of androgen receptor could then be verified in six of eight cases analysed by chromatin immunoprecipitation (ChIP) demonstrating the power of the in silico approach.

The final example is related to renal diseases and deals with the difficult analysis of podocytes. These cells help form the filtration apparatus in the glomeruli of kidneys. Podocytes cannot be cultured without loosing their functional properties, especially the formation of tight junctions, which represent the so-called slit diaphragm at the heart of the filtration process. We used a single gene with known podocyte expression, the nephrin gene to derive a framework of four different transcription factors phylogenetically conserved in man, mouse and rat [20]. Before using this model in a genome-wide search, its association with podocyte-specific expression was verified. The model was found in the promoter of another podocyte-expressed gene (ZO-1) also conserved in the promoters of all three species. Coregulation of the two genes could be verified both by extensive RT–PCR studies in patient material as well as by microarray studies. The model was then used to identify six new candidate genes in a whole-genome search, all of which also showed conservation of the framework across orthologous promoters in the three species. Subsequent studies by RT–PCR, western blot and immunogold staining verified that the single new candidate tested so far was not only transcriptionally coregulated as postulated by the model but also that the corresponding protein was present in a functional complex together with the other two proteins in the slit diaphragm, demonstrating that the in silico promoter search has revealed a biological function.


    CONCLUSIONS
 TOP
 ABSTRACT
 INTRODUCTION
 CONCLUSIONS
 Acknowledgements
 References
 
It is now well accepted that regulatory networks represent the governing processes that bind and control metabolic pathways and other processes of a living cell in response to various internal and external challenges [21]. What has become increasingly clear is the fact that quite a lot of these regulatory connections can be read from the sets of transcription-factor binding sites that can be found in regulatory sequences such as promoters [22]. Theoretical work has long established that only organized sets of such factor binding sites bear transcriptional function [23] and there is now substantial experimental evidence to support this view, e.g. [24]. The examples provided above demonstrate that a tight combination of experimental results, theoretical analysis and experimental verification of in silico predictions is an excellent and widely applicable approach to make best use of the vast amount of genomic sequence data available today for functional studies. Apparently, molecular and functional biology is heading the same way as modern physics took for the last 50 years: new findings were first proposed by theoretical studies and subsequently experiments were designed to find and verify the predicted structures and events. Theoretical evaluation of regulatory networks may well be the equivalent of quantum physics in biology justifying great efforts to improve methods as well as great expectations for the achievable results.


Key Points

  • Regulatory pathways controlled through gene networks underlie central aspects of the progression of chronic disease.
  • Regulatory networks dictate on a genome-wide scale, when and where the proteins are expressed and interact in individual pathways.
  • New techniques now allow the elucidation of gene networks and protein complexes based on promoter analysis.
  • Comparative genomics is a powerful tool for in silico analysis as phylogenetic conservation strongly suggests functionality.
  • Mechanistic details of a particular process or disease can be used for rapid high-throughput genome-wide scans.

 


    Acknowledgements
 TOP
 ABSTRACT
 INTRODUCTION
 CONCLUSIONS
 Acknowledgements
 References
 
Part of this work was supported by the BFAM ring funding project of the BMBF grant number 031U112B/031U212B ‘Analysis of regulatory regions’ to T.W. and by SFB 571 C2 to P.J.N.


    FOOTNOTES
 
Thomas Werner is the CEO and CSO of Genomatix Software GmbH and formerly studied aspects of transcription control in vitro and in silico at the GSF Research Center in Neuherberg.

Peter Nelson is on the faculty at the University of Munich where his group works on the identification of regulatory pathways important in the development and progression of chronic disease.


    References
 TOP
 ABSTRACT
 INTRODUCTION
 CONCLUSIONS
 Acknowledgements
 References
 

  1. Venter JC, Adams MD, Myers EW, et al. The sequence of the human genome. Science 2001; 291:1304–51.[Abstract/Free Full Text]
  2. Bono H, Yagi K, Kasukawa T, et al. Systematic expression profiling of the mouse transcriptome using RIKEN cDNA microarrays. Genome Res 2003; 13:1318–23.[Abstract/Free Full Text]
  3. Aebersold R, Mann M. Mass spectrometry-based proteomics. Nature 2003; 422:198–207.[CrossRef][Medline]
  4. Rual JF, Venkatesan K, Hao T, et al. Towards a proteome-scale map of the human protein–protein interaction network. Nature 2005; 437:1173–8.[CrossRef][Medline]
  5. Cusick ME, Klitgord N, Vidal M, et al. Interactome: gateway into systems biology. Hum Mol Genet 2005; 14:R171–81.[Abstract/Free Full Text]
  6. Forrest AR, Taylor D, Grimmond S. Exploration of the cell-cycle genes found within the RIKEN FANTOM2 data set. Genome Res 2003; 13:1366–75.[Abstract/Free Full Text]
  7. Zavolan M, Kondo S, Schonbach C, et al. Impact of alternative initiation, splicing, and termination on the diversity of the mRNA transcripts encoded by the mouse transcriptome. Genome Res 2003; 13:1290–300.[Abstract/Free Full Text]
  8. Fisher MT, Nagarkatti M, Nagarkatti PS. Combined screening of thymocytes using apoptosis-specific cDNA array and promoter analysis yields novel gene targets mediating TCDD-induced toxicity. Toxicol Sci 2004; 78:116–24.[Abstract/Free Full Text]
  9. Cam H, Balciunaite E, Blais A, et al. A common set of gene regulatory networks links metabolism and growth inhibition. Mol Cell 2004; 16:399–411.[CrossRef][Web of Science][Medline]
  10. Cartharius K, Frech K, Grote K, et al. MatInspector and beyond: promoter analysis based on transcription factor binding sites. Bioinformatics 2005; 21:2933–42.[Abstract/Free Full Text]
  11. Werner T, Fessele S, Maier H, et al. Computer modelling of promoter organization as a tool to study transcriptional coregulation. Faseb J 2003; 17:1228–37.[Abstract/Free Full Text]
  12. Clamp M, Andrews D, Barker D, et al. Ensembl 2002: accommodating comparative genomics. Nucleic Acids Res 2003; 31:38–42.[Abstract/Free Full Text]
  13. Gailus-Durner V, Scherf M, Werner T. Experimental data of a single promoter can be used for in silico detection of genes with related regulation in the absence of sequence similarity. Mamm Genome 2001; 12:67–72.[CrossRef][Web of Science][Medline]
  14. Hitman GA, Sudagani J. Searching for genes in diabetes and the metabolic syndrome. Int J Clin Pract Suppl 2004 3–8.
  15. Lockwood CR, Bingham C, Frayling TM. In silico searching of human and mouse genome data identifies known and unknown HNF1 binding sites upstream of beta-cell genes. Mol Genet Metab 2003; 78:145–51.[CrossRef][Web of Science][Medline]
  16. Doehr S, Klingenhoff A, Maier H, et al. Linking disease-associated genes to regulatory networks via promoter organization. Nucleic Acids Res 2005; 33:864–72.[Abstract/Free Full Text]
  17. Jemal A, Murray T, Ward E, et al. Cancer statistics, 2005. CA Cancer J Clin 2005; 55:10–30.[Abstract/Free Full Text]
  18. Ashida S, Nakagawa H, Katagiri T, et al. Molecular features of the transition from prostatic intraepithelial neoplasia (PIN) to prostate cancer: genome-wide gene-expression profiles of prostate cancers and PINs. Cancer Res 2004; 64:5963–72.[Abstract/Free Full Text]
  19. Masuda K, Werner T, Maheshwari S, et al. Androgen receptor binding sites identified by a GREF_GATA model. J Mol Biol 2005; 353:763–71.[Medline]
  20. Cohen CD, Klingenhoff A, Boucherot A, et al. Comparative promoter analysis allows de novo identification of specialized cell junction associated proteins. Proc Natl Acad Sci USA 2006 in press.
  21. Jong H. Modelling and simulation of genetic regulatory systems: a literature review. J Comput Biol 2002; 9:67–103.[CrossRef][Web of Science][Medline]
  22. Pilpel Y, Sudarsanam P, Church GM. Identifying regulatory networks by combinatorial analysis of promoter elements. Nat Genet 2001; 29:153–9.[CrossRef][Web of Science][Medline]
  23. Klingenhoff A, Frech K, Quandt K, et al. Functional promoter modules can be detected by formal models independent of overall nucleotide sequence similarity. Bioinformatics 1999; 15:180–6.[Abstract/Free Full Text]
  24. Boyer LA, Lee TI, Cole MF, et al. Core transcriptional regulatory circuitry in human embryonic stem cells. Cell 2005; 122:947–56.[CrossRef][Web of Science][Medline]

Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?



This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Werner, T.
Right arrow Articles by Nelson, P. J.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Werner, T.
Right arrow Articles by Nelson, P. J.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?