Special Issues Papers |
Joining high-throughput technology with in silico modelling advances genome-wide screening towards targeted discovery
Corresponding author. Thomas Werner, Genomatix Software GmbH Bayerstrasse 85a, 80335 München Germany. Tel: +49 89 599766 0; Fax: +49 89 599766 55. E-mail: Werner{at}genomatix.de
| ABSTRACT |
|---|
|
|
|---|
Genome research has entered the functional evaluation phase now and high-throughput (HT) methods provide an enormous amount of raw data for that purpose. However, functional verification still requires experimental regimens not suitable for application in a HT style, requiring an efficient discovery and selection process pinpointing biological mechanisms and processes for subsequent targeted verification. Regulatory networks and underlying molecular mechanisms can now be deduced through the interpretation of HT-data in the context of biologic knowledge. Computational models of promoter structures are suitable for genome-wide searches and a number of recent examples demonstrate their usefulness in prediction and selection of functional targets for experimental verification.
Keywords: microarray analysis, regulatory networks, promoter analysis, disease networks, promoter model
| INTRODUCTION |
|---|
|
|
|---|
Since the completion of the first draft sequence of the human genome was announced in 2001 [1], high-throughput (HT) method applications have gained a new momentum as it was assumed that the interpretation of the data would be straightforward. The DNA microarrays for transcription analysis [2] and proteinprotein interaction methods such as mass spectrometry of protein complexes [3] or genomic proteinprotein interaction assays (such as yeast or mammalian two-hybrid systems) have become very popular in the quest for functional analysis of genomes.
However, it is now clear that the application of HT technologies produces relatively flat data, a more cataloguing biological events and interactions than providing an understanding. Microarray data, for example, provides only a general summary of what is going on in terms of transcript levels without directly revealing any mechanisms underlying the observed changes (or detailing different simultaneously occurring events). Protein interaction data usually result in long lists of dual connections, again not revealing the reasons for complex formation other than physical affinities. Nevertheless, recent application of various HT approaches have provided an unprecedented amount of data, e.g. aiming at genome-wide coverage of basic protein interactions [4, 5] (Figure 1 top).
|
The focus of research is now shifting from the collection of data towards the functional interpretation of processes aimed increasingly at the elucidation of functional gene networks and protein complexes. This has been best exemplified by the contributions presented at the recent meetings of the BITS conference (Hinxton Hall, 2005, Kazusa, Japan, 2004 and Hinxton Hall, 2003), which bears that intention already in its title: Beyond the Identification of Transcribed Sequences. The so-called post-genomics projects such as FANTOM2 [6] are geared towards using the large-scale data collections of the earlier projects especially to elucidate elaborate biological processes such as the cell cycle or the consequences of the multitude of alternatively spliced transcripts [7].
There are two consequences from the more functional focus of the current research. On the laboratory side, schemes and setups became more sophisticated and much harder to maintain on a large scale. Functional analysis very often requires methods more like the old deep drilling approaches such as microdissection of tissues or laser-capture single cell assays, which are less suitable for broad screening. On the theoretical side, going beyond the flat connection structure, which is the typical result of HT approaches, requires more advanced methods than statistical analysis, however sophisticated it may become. There is a clear need to incorporate biological knowledge and principles into the methods in order to elucidate the biological processes underlying the HT results.
However, recent advances in theoretical analysis methods, termed in silico biology allowed dramatic progress in the biologically meaningful interpretation of HT-data [810]. While proteins are principal players in biology, their interactions are often direct including physical contact (proteomics) or clearly defined substrate interaction chains (metabolomics). Importantly, the links for all these individual pathways and processes on a genome-wide scale are usually encoded in regulatory networks dictating when and where the proteins are expressed. These include signal transduction pathways as well as transcriptional regulatory mechanisms at the enhancer and promoter level. Therefore, advances in elucidation of such regulatory networks are central to the understanding of diseases as well as normal physiological processes (Figure 1). The analysis of regulatory networks has required the development of completely new tools. In gene regulation, the functional organization of small elements takes precedence over sequence similarity [11]. With the availability of the mouse and the rat genomic sequences [12] comparative genomics has developed into a powerful in silico tool, as phylogenetic conservation of any feature is considered a token of functionality. Once mechanistic details of a particular process or disease have been revealed they can be used for very fast genome-wide scans [13]. The in-depth laboratory-based analysis is then used to verify or falsify the results of the in silico scans rather than to carry out the screening itself shifting the load of searching to the computational side.
We illustrate the successful combination of in silico and in vitro/in vivo studies on three recent examples, all related to diseases and regulatory networks underlying disease-relevant processes. Diabetes is a very complex case, which involves a multitude of genes and genetic traits. One form of diabetes, maturity onset of diabetes of the young (MODY) is an example where direct involvement of transcription factors, in particular hepatic nuclear factors (HNF family) has already been established [14]. Naturally, genome-wide scans for binding sites of such transcription factors are one way to locate additional candidate genes potentially involved in MODY. Unfortunately, such searches usually produce an enormous amount of matches not related to the process under investigation. Not all binding sites are also biologically functional and HNF factors fulfill other functions totally unrelated to MODY as well. One way to focus on disease relevant processes is to restrict the search to genes with known involvement and to take advantage of phylogenetic conservation of the findings between human and mouse as demonstrated by Lockwood et al. [15]. However, this approach does not allow for genome-wide searches, as the discriminative power of individual transcription-factor binding sites is insufficient. Doehr et al. [16] have demonstrated in an in silico study that comparative genomics can be applied to develop computational models of disease-related frameworks of multiple transcription factors. This approach allowed the elucidation of regulatory networks associated with MODY relevant processes even without direct involvement of HNF transcription factors by a genome-wide scan with the promoter models. This directly opened the way to targeted verification of the revealed connections, which can now proceed using classical functional assays.
Another example of a mechanistic in silico analysis of a disease is prostate cancer, which afflicts one in seven males during their lifetime [17]. An elegant experimental study utilizing laser microdissection and subsequent microarray screening of gene expression identified a number of genes involved in the transition of prostatic intraepithelial neoplasia to invasive prostate cancer [18]. However, such analysis does not reveal the mechanisms behind the observed changes, which requires a more intensive analysis of regulatory processes. This was carried out successfully in another recent study, where the authors used a set of only four known androgen-induced genes involved in prostate cancer, to derive a promoter framework consisting of a combination of androgen receptor with GATA binding sites, which they subsequently used to scan the whole human genome for additional promoter matches of the model [19]. In vivo binding of androgen receptor could then be verified in six of eight cases analysed by chromatin immunoprecipitation (ChIP) demonstrating the power of the in silico approach.
The final example is related to renal diseases and deals with the difficult analysis of podocytes. These cells help form the filtration apparatus in the glomeruli of kidneys. Podocytes cannot be cultured without loosing their functional properties, especially the formation of tight junctions, which represent the so-called slit diaphragm at the heart of the filtration process. We used a single gene with known podocyte expression, the nephrin gene to derive a framework of four different transcription factors phylogenetically conserved in man, mouse and rat [20]. Before using this model in a genome-wide search, its association with podocyte-specific expression was verified. The model was found in the promoter of another podocyte-expressed gene (ZO-1) also conserved in the promoters of all three species. Coregulation of the two genes could be verified both by extensive RTPCR studies in patient material as well as by microarray studies. The model was then used to identify six new candidate genes in a whole-genome search, all of which also showed conservation of the framework across orthologous promoters in the three species. Subsequent studies by RTPCR, western blot and immunogold staining verified that the single new candidate tested so far was not only transcriptionally coregulated as postulated by the model but also that the corresponding protein was present in a functional complex together with the other two proteins in the slit diaphragm, demonstrating that the in silico promoter search has revealed a biological function.
| CONCLUSIONS |
|---|
|
|
|---|
It is now well accepted that regulatory networks represent the governing processes that bind and control metabolic pathways and other processes of a living cell in response to various internal and external challenges [21]. What has become increasingly clear is the fact that quite a lot of these regulatory connections can be read from the sets of transcription-factor binding sites that can be found in regulatory sequences such as promoters [22]. Theoretical work has long established that only organized sets of such factor binding sites bear transcriptional function [23] and there is now substantial experimental evidence to support this view, e.g. [24]. The examples provided above demonstrate that a tight combination of experimental results, theoretical analysis and experimental verification of in silico predictions is an excellent and widely applicable approach to make best use of the vast amount of genomic sequence data available today for functional studies. Apparently, molecular and functional biology is heading the same way as modern physics took for the last 50 years: new findings were first proposed by theoretical studies and subsequently experiments were designed to find and verify the predicted structures and events. Theoretical evaluation of regulatory networks may well be the equivalent of quantum physics in biology justifying great efforts to improve methods as well as great expectations for the achievable results.
Key Points
|
| Acknowledgements |
|---|
|
|
|---|
Part of this work was supported by the BFAM ring funding project of the BMBF grant number 031U112B/031U212B Analysis of regulatory regions to T.W. and by SFB 571 C2 to P.J.N.
| FOOTNOTES |
|---|
Thomas Werner is the CEO and CSO of Genomatix Software GmbH and formerly studied aspects of transcription control in vitro and in silico at the GSF Research Center in Neuherberg.
Peter Nelson is on the faculty at the University of Munich where his group works on the identification of regulatory pathways important in the development and progression of chronic disease.
| References |
|---|
|
|
|---|
- Venter JC, Adams MD, Myers EW, et al. The sequence of the human genome. Science 2001; 291:130451.
[Abstract/Free Full Text] - Bono H, Yagi K, Kasukawa T, et al. Systematic expression profiling of the mouse transcriptome using RIKEN cDNA microarrays. Genome Res 2003; 13:131823.
[Abstract/Free Full Text] - Aebersold R, Mann M. Mass spectrometry-based proteomics. Nature 2003; 422:198207.[CrossRef][Medline]
- Rual JF, Venkatesan K, Hao T, et al. Towards a proteome-scale map of the human proteinprotein interaction network. Nature 2005; 437:11738.[CrossRef][Medline]
- Cusick ME, Klitgord N, Vidal M, et al. Interactome: gateway into systems biology. Hum Mol Genet 2005; 14:R17181.
[Abstract/Free Full Text] - Forrest AR, Taylor D, Grimmond S. Exploration of the cell-cycle genes found within the RIKEN FANTOM2 data set. Genome Res 2003; 13:136675.
[Abstract/Free Full Text] - Zavolan M, Kondo S, Schonbach C, et al. Impact of alternative initiation, splicing, and termination on the diversity of the mRNA transcripts encoded by the mouse transcriptome. Genome Res 2003; 13:1290300.
[Abstract/Free Full Text] - Fisher MT, Nagarkatti M, Nagarkatti PS. Combined screening of thymocytes using apoptosis-specific cDNA array and promoter analysis yields novel gene targets mediating TCDD-induced toxicity. Toxicol Sci 2004; 78:11624.
[Abstract/Free Full Text] - Cam H, Balciunaite E, Blais A, et al. A common set of gene regulatory networks links metabolism and growth inhibition. Mol Cell 2004; 16:399411.[CrossRef][Web of Science][Medline]
- Cartharius K, Frech K, Grote K, et al. MatInspector and beyond: promoter analysis based on transcription factor binding sites. Bioinformatics 2005; 21:293342.
[Abstract/Free Full Text] - Werner T, Fessele S, Maier H, et al. Computer modelling of promoter organization as a tool to study transcriptional coregulation. Faseb J 2003; 17:122837.
[Abstract/Free Full Text] - Clamp M, Andrews D, Barker D, et al. Ensembl 2002: accommodating comparative genomics. Nucleic Acids Res 2003; 31:3842.
[Abstract/Free Full Text] - Gailus-Durner V, Scherf M, Werner T. Experimental data of a single promoter can be used for in silico detection of genes with related regulation in the absence of sequence similarity. Mamm Genome 2001; 12:6772.[CrossRef][Web of Science][Medline]
- Hitman GA, Sudagani J. Searching for genes in diabetes and the metabolic syndrome. Int J Clin Pract Suppl 2004 38.
- Lockwood CR, Bingham C, Frayling TM. In silico searching of human and mouse genome data identifies known and unknown HNF1 binding sites upstream of beta-cell genes. Mol Genet Metab 2003; 78:14551.[CrossRef][Web of Science][Medline]
- Doehr S, Klingenhoff A, Maier H, et al. Linking disease-associated genes to regulatory networks via promoter organization. Nucleic Acids Res 2005; 33:86472.
[Abstract/Free Full Text] - Jemal A, Murray T, Ward E, et al. Cancer statistics, 2005. CA Cancer J Clin 2005; 55:1030.
[Abstract/Free Full Text] - Ashida S, Nakagawa H, Katagiri T, et al. Molecular features of the transition from prostatic intraepithelial neoplasia (PIN) to prostate cancer: genome-wide gene-expression profiles of prostate cancers and PINs. Cancer Res 2004; 64:596372.
[Abstract/Free Full Text] - Masuda K, Werner T, Maheshwari S, et al. Androgen receptor binding sites identified by a GREF_GATA model. J Mol Biol 2005; 353:76371.[Medline]
- Cohen CD, Klingenhoff A, Boucherot A, et al. Comparative promoter analysis allows de novo identification of specialized cell junction associated proteins. Proc Natl Acad Sci USA 2006 in press.
- Jong H. Modelling and simulation of genetic regulatory systems: a literature review. J Comput Biol 2002; 9:67103.[CrossRef][Web of Science][Medline]
- Pilpel Y, Sudarsanam P, Church GM. Identifying regulatory networks by combinatorial analysis of promoter elements. Nat Genet 2001; 29:1539.[CrossRef][Web of Science][Medline]
- Klingenhoff A, Frech K, Quandt K, et al. Functional promoter modules can be detected by formal models independent of overall nucleotide sequence similarity. Bioinformatics 1999; 15:1806.
[Abstract/Free Full Text] - Boyer LA, Lee TI, Cole MF, et al. Core transcriptional regulatory circuitry in human embryonic stem cells. Cell 2005; 122:94756.[CrossRef][Web of Science][Medline]
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
