Briefings in Functional Genomics and Proteomics Advance Access originally published online on February 10, 2006
Briefings in Functional Genomics and Proteomics 2006 4(4):331-342; doi:10.1093/bfgp/eli004
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Designing experiments that aid in the identification of regulatory networks
Corresponding author. Jeremy S. Edwards, Department of Molecular Genetics and Microbiology, Cancer Research and Treatment Center, University of New Mexico Health Sciences Center; and Department of Chemical and Nuclear Engineering, University of New Mexico, Albuquerque, NM 87313, USA. Tel: +1 50 5272 5465; Fax: +1 50 5272 6029; E-mail: jsedwards{at}salud.unm.edu
| ABSTRACT |
|---|
|
|
|---|
Predictive mathematical models of the interactions of a genetic network can provide insight into the mechanisms of gene regulation, the role of various genes within a network and how multiple genes interact leading to complex traits. However, identification of the parameters and interactions is currently a limiting step in the development of such models. This work reviews the state of the art for design of experiments in biological systems and demonstrates the need for improved design of experiments through the use of a model system. Appropriate design of experiments has a profound impact on the ability to identify a model and on the quality of resulting identified model. Key issues include the selection of appropriate input sequences (e.g. random, independent multivariate inputs) and the selection of the sampling frequencies. This work demonstrates that these issues are especially important in the identification of biochemical networks and that the traditional biochemical approach is incapable of truly identifying the behavior present in such networks.
Keywords: sampling frequency, nonlinear dynamics, experimental design, biological regulatory networks
| BACKGROUND |
|---|
|
|
|---|
One of the key challenges facing biology today is model identification. Model identification involves finding a mathematical model of a biological network given a class of tentative model structures and experimental data (Figure 1) [1]. Many of the data currently available in the literature are insufficient for biologists to propose candidate models and to identify the parameters to predict cellular physiology [2]. To overcome the current limitations, experimental design must be addressed; namely, experiments must be designed to elucidate as much information as possible about the biological system at reasonable costs. Historically, biologists have applied a reductionist approach where components of cellular systems are studied one species and/or one variable at a time. The entire network was then treated as a sum of the parts. However, the advent of high-throughput technologies has led to the development of more holistic approaches. As high-throughput technologies become more commonplace and more accurate, it is essential that methods and tools be developed to take advantage of these valuable data.
|
High-throughput technologies have been developed for the measurement of all RNA species in a cell [35], certain metabolites [5, 6] and the protein levels within a cell [4, 7, 8]. Of these various profiling technologies, microarrays are the most extensively used. Microarrays can be used to assay relative mRNA levels between two experimental conditions [9]. In many experimental studies, mRNA profiles provide an estimate of the steady-state changes in mRNA levels for all of the species surveyed. By examining mRNA species that change from one condition to the next, attempts have been made to identify genes that are co-regulated or inversely regulated. For example, Jenkins and coworkers explored the effect of calcium on mammalian cell cycle regulation in this manner [10]. Additionally, Tavazoie and coworkers used K-mean clustering algorithms to determine which genes appear to move in a concerted manner under a variety of conditions [11]. These clusters of genes are assumed to be co-regulated and the upstream regions were then searched for potential cis-regulatory sites whose existence could then be verified experimentally [11].
These experimental approaches have been beneficial in generating hypotheses that have led to increased understanding of genetic regulation. However, previous experience in systems engineering reveals several potential pitfalls to avoid in applying such approaches to generating dynamic models of regulation. First, in order to understand the dynamic features of interest, the underlying system must be persistently excited to excite all the dynamics. Inappropriate excitation will not provide insight into desired dynamics. Second, sampling times for inferring co-regulation must be carefully chosen. Sampling at too high or too low frequency will mask dynamic features. Finally, single variable perturbations will not provide information on multivariate interactions, and hence multivariate perturbations should be utilized if the effects of such interactions are to be captured [12].
These points are well accepted in the systems engineering community. As biological systems are far more complex than the systems typically explored in the systems engineering community, these experimental design issues are even more critical for biological systems. However, as indicated in a recent review on yeast genomic expression studies, these issues are frequently overlooked in the design of biological experiments [13]. This is further evidenced by the complete lack of or cursory discussion of experimental design in many recent books on experimental procedures in transcriptomics [1419], metabolomics and proteomics [17, 20]. Some books even advocate experimental designs that cannot accurately identify the types of networks the experiments seek to explore [21] or refer to quality control issues as design of experiments while overlooking essential factors [22, 23]. Therefore, there is a pressing need to explore experimental design issues with respect to identifying biological networks. Each of the experimental design issues raised above will be discussed in greater detail below and explored through a model system.
| EXPERIMENTAL DESIGN CONSIDERATIONS |
|---|
|
|
|---|
Draghici provides a detailed set of guidelines for experimental design, summarized in Table 1 [23]. Each of the steps outlined is essentialhowever, in step 7 (design your experiment), several other factors need to be taken into account. The challenges and issues that need to be addressed arise because biological networks are transient in nature. To truly understand a biological regulatory network, some understanding of the transient interactions is necessary. These types of experiments cannot be obtained from steady-state data, but instead require time courses. Thus, the introductory design of experiments typically presented in most textbooks [2426] is insufficient and a more in-depth analysis is required [27, 28].
|
| PERSISTENT EXCITATION |
|---|
|
|
|---|
Persistent excitation refers to the frequent stimulation of a system to prevent near steady-state behaviors from dominating the observations [12]. Often in biological experiments, single step changes are made in one variable and the system response is followed as the system moves from an initial steady-state to the final steady-state. Such changes lead to results that are easily interpreted from a biochemical standpoint and are easily implemented experimentally. However, step changes do not persistently excite the network. Therefore, if the data is evenly sampled over the entire dynamic window, most of the data will be biased towards the approach to steady-state. As a result, many of the dynamic features cannot be practically identified, even with extensive prior knowledge [2].
Consider a first order, linear differential equation model of the mRNA levels within a cell controlled by an inducer. The response is completely characterized by its gain and time constant. The gain determines the final endpoint that results from the step change in inducer concentration. The time constant,
, is defined by how long it will take for a system to reach 63% of the steady-state. Consider the case where eight data points spaced by
min are taken starting at t = 0, when the step change is implemented (Figure 2A). The largest movement occurs between the first three measurements while the remaining data essentially describe the new steady-state. For more complex systems, much of the interesting behavior is lost by only looking at the approach to a single steady-state as most of the data emphasize the gain over the time-constant.
|
Persistent excitation overcomes this issue by making repeated changes to the system inputs at close intervals (<5
). The system, therefore, never completely approaches a new steady-state but instead is always in transit. This ensures that, in addition to the steady-state gain, a better description of the dynamics can be obtained and that complex effects (resulting from interactions and feedback common in biological systems) can be better described. | SAMPLING FREQUENCY |
|---|
|
|
|---|
Even if persistent excitation is used, data sampling frequencies must be appropriate to capture the inherent system dynamics of interest. Dynamic regulatory effects can be missed by considering only steady-state data or data collected at a very low frequency [29]. For example, a commonly observed biological phenomenon is inverse response. Inverse response occurs when the initial direction of the system's response to a step change in the input is opposite to the direction of the final steady-state [12]. A biological example of this can be found in glycolysis when an adenosine triphosphate (ATP) load is introduced to the cell [30]. Adenosine diphosphate (ADP) levels initially increase rapidly in response to the load (Figure 2B). Eventually, the ATP load leads to decreased adenosine phosphates as the cell compensates. These two competing effects occurring on distinctly different time scales give the inverse response effect. The sampling frequencies that only capture the approach to steady-state would miss this crucial dynamic feature.
As the sampling frequency increases, the material costs of the experiments increase. Further, above a certain sampling frequency additional information will not help (and may actually hurt). This effect arises due to noise in the measurements. The upper limit on the sampling frequency can be estimated from prior knowledge of the inherent dynamics of the biological system and/or the end use of the desired model.
| MULTIVARIATE INPUTS |
|---|
|
|
|---|
Despite persistent excitation and appropriate sampling frequency, the combined effects of multiple, interacting inputs can be missed if only single inputs are perturbed one at a time. In complex systems, multivariate interaction effects are impossible to predict from single variable perturbations [12, 31]. Consider the case of:
|
|
1 +
12u2. Similarly, changing u2 alone allows for estimation of the lumped parameter
2 +
12u1. However, an estimate of
12 cannot be obtained unless both u1 and u2 are varied simultaneously. These types of interactions are common in biological systems and can only be fully characterized by perturbing multiple inputs simultaneously. Ljung raised two pivotal questions a modeller must address when attempting to apply identification methods; How should I design the identification experiment? and Within what set of models should I look for a suitable description of the system? [32]. The answers to these questions depend in part on what the model is to be used for and the prior knowledge available about the system [1]. These represent constraints on the modelling exercise that are necessary to ensure that an appropriate solution can be found. The answer to the question of whether or not the resulting model is good enough will depend highly on the end use of the model. Questions such as how much error can be tolerated, which states need to be most accurately estimated, and what types of bias can be accepted are all functions of the model's use and need to be carefully explored prior to, during, and after, model identification. These issues will limit the applicability of the resulting model and will determine how many data are required to identify the model parameters [33].
Predictive mathematical models of the interactions and regulations of genetic networks can provide insight into mechanisms of gene regulation, the role of various genes within a network, and how several genes interact to produce complex traits. For these broad classes of problems, having a mathematical model can be useful for interpreting existing experimental data, designing new experiments, and proposing new hypotheses. To realize the full potential of such models, an understanding of the underlying experimental challenges of system identification is essential. In particular, the potential pitfalls of inappropriate excitation, sampling and multivariate interactions necessitate guidance regarding how and what types of data should be collected to identify biological models. This manuscript explores these three experimental design issues in greater detail. To provide perspective, both of Ljung's questions are explored in the context of a benchmark model system discussed below, with the greatest detail provided on the design of appropriate experiments. Issues such as input sequence design, frequency of perturbation and measurement, and multivariate interactions are explored.
| MODEL SYSTEM |
|---|
|
|
|---|
Figure 3 schematically shows a hypothetical three-state network with feedback. By using a system with a known mathematical description, a more complete analysis can be done than could be carried out using an experimental system. The benchmark network models the response of three arbitrary transcription factors to changes in three inducers. Each of the proteins influences the rate of synthesis of at least one other protein. The full nonlinear equations describing this network are:
|
|
|
|
|
|
| RESULTS AND DISCUSSION |
|---|
|
|
|---|
In this work, the emphasis was on the identification of the parameters of the linear model. It should be noted that the emphasis could have been placed on the development of candidate model structures [1]. To identify model structure, physical principles, empirical relations and other a priori information play a key role. Typically, these model structures require nonlinear identification routines, which exacerbate the identification issues raised below.
In each case explored, identification was carried out as described in the methods below. The default state-space structure for the MATLABTM System Identification Toolbox assumes the form:
|
|
Two issues arise that necessitate validation studies. For completeness, these issues are briefly discussed. First, the parameters identified are those that best fit the data set using a specified model structure. To test the assumptions and prior knowledge that were used to achieve the desired model structure, validation against additional datasets is essential. Second, conclusions drawn from analysis of a dataset, whether qualitative (such as interactions) or quantitative (such as parameter estimates) are only as good as the dataset that was used to arrive at the knowledge. Improper experimental design can lead to invalid conclusions that can only be found by careful validation of the model against additional well designed experiments.
Random, independent inputs allow better identification
A common practice in experimental biology has been to vary one input to the biological system at a time and see how the system responds; e.g. heat shock [34], acetate stress [35], galactose addition [36], or diauxic shift [37]. Often, these changes are introduced as a step change in environmental conditions. Figure 5A (panels A and B) demonstrates the effectiveness of step response data in generating linear models.
|
Three linear models were identified from step response data (Figure 5A). The first data set consisted of three experiments. In each experiment, only one of the three system inputs was changed. Measurements were taken every fifteen minutes starting at t = 0 and continuing until t = 60 min with a step change introduced at t = 15 min. The second data set added to this an additional experiment in which all three inputs underwent a simultaneous step change at t = 15 min. Sampling was done once every 15 min for 60 min. This explored the effect of simultaneously varying all three inputs. The third data set explored the effect of sequentially varying each of the inputs. This data set consisted of the same three experiments in the first data set plus a fourth experiment in which a single step change was sequentially introduced to each of the three system inputs, spaced 15 min apart. Sampling was again done once every 15 min for 60 min.
The model performance for each of the models identified by each of the three step change datasets described above was assessed. The results demonstrate that step changes were ineffective at capturing the inherent dynamics of the underlying system (Figure 5A). For each of the three step change models identified, substantial errors were observed compared with the linear Taylor series model. Whereas the Taylor series model demonstrated less than 1% error for the validation sequence (Figure 4A), the step response models demonstrated as much as 5% error.
The model performance based on data from a series of random inputs to each of the three inputs was compared with the model performance derived from the step changes (Figure 5A). The random inputs were introduced once every 15 min and data were sampled once every 7.5 min. To make the comparison fair, the random input profile data were collected over 105 min. This corresponded to a total of 15 data points for each concentration species, the same as in the first step inputs experimental case. The performance of the linear Taylor series model was more closely approximated by the random inputs model than by the step change model. In particular, the random inputs model demonstrated less error and a tighter distribution around the true predicted values than the models identified from step change data. For the random input model, less than 2% error was observed between the identified linear model and the full nonlinear model.
In this study, it was assumed that the model structure was known (i.e. a linear state-space model) but that reliable estimates of the model parameters were not available. In such a case, where the degree of interaction is not fully characterized and the inherent dynamic properties of the system are not available, random input sequences such as those described above will perturb the system in most directions (i.e. relative linear combinations of inputs), thus increasing the probability of exciting the system along (or near) its eigenvectors. These random sequences will thus provide estimates of the key parameters within the network. Further, performing a series of step experiments to identify the model parameters will not provide reliable estimates [38, 39]. Therefore, a random input profile will provide the greatest insight.
Sampling frequency impacts model quality for random inputs
It could be argued that the observed differences between the step experiments and the random input profiles may be due to the different sampling frequencies between the two cases (15 min for the step changes and 7.5 min for the random input changes). To address this concern, the effect of increasing the sampling frequency for the step case on the quality of the predictions was determined (Figure 5B). Using the sampling frequency twice (once every 7.5 min) did not significantly improve the quality of the resulting identified model. Increasing to once every 5 min gave similar results. This indicated that the improved identification was due to dynamic excitation and not to measurement frequency.
However, with the random inputs, sampling frequency had a profound effect on the model identification (Figure 5C). Increasing the sampling frequency from once every 7.5 min to once every 5 min led to the identification of a model that more closely approximated the behavior of the linear Taylor series model. Further increasing the sampling frequency to once every 3 min actually led to decreased prediction quality (Figure 5C). At higher frequencies, noise in the dataset masks true dynamic features. As a consequence, erroneous parameter estimates are obtained that capture the dynamics of the noise. This demonstrated that there exists an optimal sampling frequency based on the perturbation frequency, the inherent system dynamics, and the noise in the dataset. Efficient and effective experimental design requires a priori identification of these optimal frequencies.
Identification from random perturbations more robust to noise
All of the models in Figure 5 were identified by adding random Gaussian noise with a standard deviation of 16% to the exact predictions of the nonlinear model. To explore the sensitivity of model quality to the level of noise, the standard deviation of the noise was varied from 2% up to 16% (Figure 6). For the step responses, no significant effect was observed as the noise level changed. The identified model was equally poor in each case. As the noise level decreased in the random input datasets, the identified model approached the linear Taylor series approximation. Thus, random perturbations were able to better extract underlying dynamic features from noisy data. Again, the number of data points was conserved across comparisons: each model was identified with a total of 15 data points.
|
Optimal frequency of perturbation
The effect of the perturbation frequency on model quality was investigated. In this case, the data again contained random Gaussian noise with a standard deviation of 16%. In addition to the optimal sampling frequency identified above, an optimal perturbation frequency exists (Figure 7). Perturbing once every 10 min and measuring once every 5 min identified a linear model similar to the linear Taylor series model. Analysis of the linear Taylor series model provided insight into why this was the case. The eigenvalues of the system presented in Equation (3) were [0.0503, 0.1006, 0.1502], indicating a stable system with characteristic time constants of approximately 19.9, 9.9 and 6.7 min. With a sampling time of 5 min and perturbations every 10 min, the first dynamic mode achieved approximately 40% of its change, the second mode achieved approximately 64% of its change, and the third mode achieved approximately 78% of its change before the next set of perturbations were introduced. Increasing the perturbation and measurement frequencies masked the dynamics of the slowest mode by limiting it to a much smaller fraction of change. Lower perturbation and measurement frequencies missed dynamic information by allowing the fastest mode to approach steady-state. Thus, a tradeoff between the competing modes existed. For complex biological systems, multiple perturbation frequencies may be required to explore all of the modes of interest.
|
Of the three perturbation frequency cases explored, perturbing once every 10 min and measuring once every 5 min identified the best model (Figure 7). Additional data should improve model quality. To test this, the same experiment was carried out for twice as long, as shown in Figure 7. The resulting model more closely approximated the linear Taylor series model. However, this effect was significantly less profound than the identification of the appropriate perturbation and measurement frequencies. This indicated that the identification of the appropriate time constant of perturbation is an essential component of the experimental design.
This work highlights the issues that need to be addressed with respect to the design of the perturbation frequencies. However, much additional work is needed. Namely, this work demonstrated that information needs to be present for all of the dynamic modes in order to obtain the most accurate model possible. Perturbation sequences that were too fast missed the slower dynamics in the system whereas slow perturbation sequences missed the fast dynamic features. Although it was not discussed here, significant work has been done in the systems engineering field that treats this very issue, typically by exploring the frequency domain of the process of interest [12, 31].
For most cases, the perturbation frequency can be identified based on the end use of the model. Knowledge of what the model will be used for can often identify the key dynamic time scales of interest. For example, if a model is to be used to predict responses on the scale of 1 h, dynamic events occurring on time scales faster than 1 min or slower than 10 h are likely to be of little importance. As models are typically developed for some end use, specification of the timescales for which the model is to be used is essential. Perturbations can then be designed around these time scales. If the resulting model is of inadequate quality, additional consideration of the design of the perturbation sequence and/or of the model structure may be required. The inadequacies may be due to missed interactions and/or nonlinear effects. To overcome these issues, the system may need to be perturbed at additional frequencies, the underlying model structure may need to be updated, or additional data may be required to allow for better estimates of the parameters.
Generalizing the approach
For a given model structure with a given set of parameters, there exists an optimal input sequence to obtain the best estimates of those parameters with the least amount of experimental work. However, that optimal input sequence design is dependent on both the model structure and on the actual parameter values. As the parameter values are not known a priori, this optimal input sequence design cannot be known a priori. Thus, every experiment is likely to be suboptimal. The goal then becomes to pick an experiment or a series of experiments that will be likely to provide useful information.
As indicated in Figure 1, identification is an iterative procedure. Once better estimates of the experimental parameters are known, a more refined input sequence can be used to excite the dynamics. The design of these refined input sequences is described elsewhere [3942]. The point here is that, while any given input sequence is incapable of providing network identification regardless of system properties, certain inputs are better able to discern between candidate interactions. Further, other inputs sequences are fundamentally incapable of accomplishing this task.
| CONCLUSIONS |
|---|
|
|
|---|
The appropriate design of experiments has a profound impact on the ability to identify a model and on the quality of resulting identified model. The typical approach for elucidating dynamic features of biological regulatory networks has been to make step changes in one variable at a time, observe the resulting dynamic response, and then attempt to back out qualitative and/or quantitative models to describe the observed dynamics. This work demonstrates that such models do not identify key dynamic features. Capturing the multivariate nature inherent to biological regulatory networks requires multivariate random perturbations. This is especially true when the underlying data contain high levels of noise, as in the case of most biological data.
Designing these multivariate random perturbations presents a challenge that needs to be addressed. This work demonstrates that the key issues are the identification of the perturbation and measurement frequencies and the number of measurements to take. Because of the effect of noise, the maximum measurement frequency should be no faster than approximately ten times the frequency of the fastest dynamics of interest. From the Shannon frequency, the measurement frequency should also be no less than twice the perturbation frequency. The frequency of perturbation is strongly dependent on the time scale of interest for the end use of the model and on the inherent dynamics of the underlying system. Using these guidelines as a starting point, more informative experiments can be designed and implemented to enhance the modelling and identification of regulation in biological networks.
Methods
The identification problem was formulated as: Given the datasets y(k), {k = 0, 1, 2, ...} generated by implementing the input sequence u(k), {k = 0, 1, 2, ...} on the true but unknown systems in Equation (2), obtain estimate values for the parameters A and B in the approximate linear model
|
|
Seventeen virtual experiments were carried out using the full nonlinear model as a data generator. These experiments are summarized in Table 2. The virtual experiments can be classified into three categories: step responses in one input at a time, step responses in multiple inputs, and independent, random variation in each input. For each experiment, u(k) was specified a priori and introduced to the full nonlinear model. The impact of noise on the quality of the resulting model was explored by adding 216% noise to the nonlinear model predictions, resulting in the datasets y(k).
|
The MATLABTM System Identification toolbox was used to identify continuous, linear state-space models for each of several combinations of virtual experiments. Each of the identified state-space models was then compared with the full nonlinear model in order to test model performance. The same series of inputs shown in Figure 4A were used to excite each of the models. The percentage error was defined for each state as:
|
|
Key Points
|
| Acknowledgements |
|---|
|
|
|---|
This work was supported by NASA GSRP Fellowship NGT5 50367 (KJK), NIH Grant 5P20 RR15588-02, and the US Department of Energy Division of Biological Sciences Microbial Cell Project.
| FOOTNOTES |
|---|
Kenneth J. Kauffman completed his Ph.D. at the University of Delaware under the guidance of Jeremy S. Edwards in 2003.
Babatunde A. Oguinanaike is the William L. Friend Professor of Chemical Engineering at the University of Delaware.
Jeremy S. Edwards is an assistant professor in the Department of Molecular Genetics and Microbiology, University of New Mexico Health Sciences Center and also in Department of Chemical and Nuclear Engineering, University of New Mexico, Albuquerque, NM 87313, USA.
| References |
|---|
|
|
|---|
- Bohlin T. Interactive System Identification: Prospects and Pitfalls. New York: Springer-Verlag 1991.
- Zak DE, Gonye GE, Schwaber JS, et al. Importance of input perturbations and stochastic gene expression in the reverse engineering of genetic regulatory networks: insights from an identifiability analysis of an in silico network. Genome Res 2003; 13:2396405.
[Abstract/Free Full Text] - Napoli C, de Nigris F, Sica V. New advances in microarrays: finding the genes causally involved in disease. Methods Mol Med 2005; 108:21533.[Medline]
- Merrick BA, Madenspacher JH. Complementary gene and protein expression studies and integrative approaches in toxicogenomics. Toxicol Appl Pharmacol 2005; 207:Suppl 2, 18994.[CrossRef][Medline]
- Lelliott CJ, Lopez M, Curtis RK, et al. Transcript and metabolite analysis of the effects of tamoxifen in rat liver reveals inhibition of fatty acid synthesis in the presence of hepatic steatosis. Faseb J 2005; 19:110819.
[Abstract/Free Full Text] - Sriram G, Fulton DB, Iyer VV, et al. Quantification of compartmented metabolic fluxes in developing soybean embryos by employing biosynthetically directed fractional (13)C labeling, two-dimensional [(13)C, (1)H] nuclear magnetic resonance, and comprehensive isotopomer balancing. Plant Physiol 2004; 136:304357.
[Abstract/Free Full Text] - Streitz JM Jr., Madden MT, Marimanikkuppam SS, et al. Analysis of protein expression patterns in Barrett's esophagus using MALDI mass spectrometry, in search of malignancy biomarkers. Dis Esophagus 2005; 18:1706.[CrossRef][Medline]
- Dieguez-Acuna FJ, Gerber SA, Kodama S, et al. Characterization of mouse spleen cells by subtractive proteomics. Mol Cell Proteomics 2005; 4:145970.
[Abstract/Free Full Text] - Kal AJ, van Zonneveld AJ, Benes V, et al. Dynamics of gene expression revealed by comparison of serial analysis of gene expression transcript profiles from yeast grown on two different carbon sources. Mol Biol Cell 1999; 10:185972.
[Abstract/Free Full Text] - Jenkins RE, Hawley SR, Promwikorn W, et al. Regulation of growth factor induced gene expression by calcium signalling: integrated mRNA and protein expression analysis. Proteomics 2001; 1:1092104.[CrossRef][Medline]
- Tavazoie S, Hughes JD, Campbell MJ, et al. Systematic determination of genetic network architecture. Nat Genet 1999; 22:2815.[CrossRef][ISI][Medline]
- Ogunnaike BA, Ray WH. Process Dynamics, Modeling and Control New York: Oxford University Press 1994.
- Gasch AP. Yeast genomic expression studies using DNA microarrays. In: Guthrie C, Fink GR (eds). Methods in Enzymology. New York: Academic Press, 2002 393414.
- Matson R. Applying Genomic and Proteomic Microarray Technology in Drug Discovery. New York: CRC Press 2004.
- Shoemaker J, Lin S. Methods of Microarray Data Analysis IV. New York: Springer-Verlag 2005.
- Kamberova G, Shah S. DNA Array Image Analysis Nuts and Bolts. Eagleville, PA: DNA Press 2002.
- Hardiman G. Microarrays Methods and Applications. Eagleville, PA: DNA Press 2003.
- Johnson K, Lin S. Methods of Microarray Data Analysis III. New York: Kluwer Academic Publishers 2003.
- Simon R, Korn E, McShane L, et al. Design and Analysis of DNA Microarray Investigations. Eagleville, PA: Springer-Verlag 2003.
- Hamdan M, Righetti P. Proteomics Today: Protein Assessment and Biomarkers using Mass Spectrometry, 2D Electrophoresis, and Microarray Technology. New York: John Wiley and Sons 2005.
- Baldi P, Hatfield G. DNA Microarrays and Gene Expression: From experiments to data analysis and modeling. Cambridge, UK: Cambridge University Press 2002.
- Speed T. Statistical Analysis of Gene Expression Microarray Data. New York: Chapman & Hall/CRC 2003.
- Draghici S. Data Analysis Tools for DNA Microarrays. New York: Chapman & Hall/CRC 2003.
- Anderson M, Whitcomb P. DOE Simplified: Practical Tools for Effective Experimentation. Portland, OR: Productivity, Inc 2000.
- Bevington P, Robinson D. Data Reduction and Error Analysis for the Physical Sciences. 3rd edn New York: McGraw Hill 2003.
- Montgomery D. Design and Analysis of Experiments. 6th edn New York: John Wiley and Sons 2005.
- Box G, Box J, Hunter W. Statistics for Experimenters: Design, Innovation, and Discovery. 2nd edn New York: John Wiley and Sons 2005.
- Box GEP, Jenkins GM, Reinsel GC. Time Series Analysis: Forecasting and Control. 3rd edn Upper Saddle River, NJ: Prentice Hall 1994.
- Zak DE, Doyle FJ III, Gonye GE, et al. Simulation studies for the identification of genetic networks from cDNA array and regulatory activity data. Proc 2nd Intn Conf Systems Biol. 2001 pp. 2318.
- Kauffman KJ, Pajerowski JD, Jamshidi N, et al. Description and analysis of metabolic connectivity and dynamics in the human red blood cell. Biophy J 2002; 83:64662.
[Abstract/Free Full Text] - Skogestad S, Postlethwaite I. Multivariate Feedback Control: Analysis and Design. New York: John Wiley & Sons, Inc. 1996.
- Ljung L. Aspects on the system identification problem. Signal Processing 1982; 4:44556.
- Selinger DW, Wright MW, Church GM. On the complete determination of biological systems. Trends Biotechnol (in press).
- Richmond CS, Glasner JD, Mau R, et al. Genome-wide expression profiling in Escherichia coli K-12. Nucleic Acids Res 1999; 27:382135.
[Abstract/Free Full Text] - Lasko DR, Schwerdel C, Bailey JE, et al. Acetate- specific stress response in acetate-resistant bacteria: an analysis of protein patterns. Biotechnol Prog 1997; 13:51923.[CrossRef][Medline]
- Dudley A, Aach J, Steffen M, et al. Measuring absolute expression with microarrays with a calibrated reference sample and an extended signal intensity range. Proc Nat Acad Sci USA 2002; 99:(11)75549.
[Abstract/Free Full Text] - Joseph E, Danchin A, Ullmann A. Regulation of galactose operon expression: glucose effects and role of cyclic adenosine 3',5'-monophosphate. J Bacteriol 1981; 146:14954.
[Abstract/Free Full Text] - Bendat JS. Nonlinear System Analysis and Identification from Random Data. New York: John Wiley & Sons 1990.
- Pearson RK. Discrete-Time Dynamic Models. New York: Oxford University Press 1999.
- Eykhoff P. System Identification: Parameter and State Estimation. New York: John Wiley & Sons 1974.
- Ljung L. System Identification: Theory for the User. Upper Saddle River, NJ: Prentice Hall PTR 1999.
- Pearson RK, Pottmann M. Gray-box identification of block-oriented nonlinear models. Journal of Process Control 2000; 10:30115.
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||





u2,
u3). Comparison of linear model predictions (symbols) versus nonlinear model values (solid lines) (B)
y1,lin, +y2,lin, (C) *y3,lin.


