Briefings in Functional Genomics and Proteomics Advance Access originally published online on May 26, 2006
Briefings in Functional Genomics and Proteomics 2006 5(2):144-153; doi:10.1093/bfgp/ell026
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Special Issue Papers |
Enhanced sequence coverage of proteins in human cerebrospinal fluid using multiple enzymatic digestion and linear ion trap LC-MS/MS
Corresponding author. Roger G. Biringer, Thermo Electron, 355 River Oaks Parkway, San Jose, CA 95134, USA. Tel: +1 408 965 6285; Fax: +1 408 965 6139; E-mail: roger.biringer{at}thermo.com
| ABSTRACT |
|---|
|
|
|---|
The cerebrospinal fluid (CSF) provides a ready access into the health state of the central nervous system, and alterations in some CSF proteins have been documented in brain disease. However, the complete variety of proteins is not known and methods to identify protein components are still being developed. The goal of this study was to examine the sequence coverage obtained from human CSF digests produced with different proteases. Enzymatic digests of CSF proteins were obtained with arginine-C endopeptidase (ArgC), glutamic acid endopeptidase (GluC), chymotrypsin, trypsin and their combinations, and then examined using reverse phase chromatography and a FinniganTM LTQTM linear ion trap mass spectrometer. Peptide sequences were identified with BioWorks 3.1 and sequence coverage calculated for the 38 most confidently identified proteins. Trypsin and GluC yielded greater coverage than chymotrypsin, while ArgC had the least sequence coverage. Protein sequence coverage was affected only slightly over four orders of magnitude dynamic range of abundance. Combining the peptides derived from different proteases further increased the coverage. Maximal sequence coverage was achieved by combining digest results from both GluC and trypsin. These results have implications for future studies to identify CSF proteins and their post-translational modifications.
Keywords: trypsin, glutamic acid endopeptidase
| INTRODUCTION |
|---|
|
|
|---|
The cerebrospinal fluid (CSF) is a readily accessible sample that can be analysed to gain insight to the health state of the central nervous system. Proteins comprise a major component of CSF and some have established roles in clinical chemistry practice. Total protein is routinely measured and any elevation beyond the normal range of 0.20.5 g/ml indicates concern for a pathological process [1]. An excess of immunoglobulin, usually measured as the appearance of oligoclonal bands, indicates some inflammatory neurological disorder and is especially associated with multiple sclerosis [2]. A number of disease-related proteins are of research interest, but are still somewhat non-specific and, accordingly, have not been validated for any clinical utility. Examples of these include protein correlations with the Alzheimer's disease [3], Parkinson's disease [4] and schizophrenia [5].
The first example of a discovery-based CSF protein analysis that has led to a useful clinical test was the discovery of 14-3-3 proteins from a two-dimensional (2D) gel-based approach [6, 7]. In the context of a dementia suspected of being CreutzfeldJakob disease (CJD), the presence of immunoreactive 14-3-3 has been validated as a useful test that supports the diagnosis of CJD [8]. However, we believe that if more CSF proteins are studied, useful knowledge of their importance in brain function will be found [9]. This may have an impact for many common brain disorders that currently have no objective marker, and for which the pathophysiology is unknown. Accordingly, more extensive discovery-based experiments are being pursued to identify as many CSF proteins as possible and then investigate them for disease correlates.
The methodologies for discovery-based searches substantially increase the ability to identify and quantify proteins, and include various forms of electrophoresis, chromatography and mass spectrometry (MS). Two-dimensional gel electrophoresis and liquid chromatography/mass spectrometry (LC/MS) are the most common approaches used currently and top-down LC/MS is on the horizon [10]. With these methods, more human CSF proteins have been detected [1124]. We have identified 2000 different proteins thus far [25]. Since neurexin genes have thousands of alternatively spliced products [26] and proteins such as prostaglandin D synthase have over ten different isoforms [27], we expect the final CSF protein diversity to increase many fold when alternative splicing and post-translational modifications are taken into account.
Improvements in our ability to more rapidly identify proteins with accuracy and precision, measure their abundance and define the more subtle changes (from genetic, splicing or post-translational modifications) are required to address the diversity and advance the field further. For LC/MS-based methodologies, the degree of confidence in peptide identification, the ability to quantify identified proteins and the ability to detect discrete modifications in protein sequence and structure all increase as more peptides are identified. Presently, there are relatively few literature references citing the use of multiple enzymes to aid in the identification or to characterize proteins in complex mixtures. Enhanced coverage was shown for purified proteins or those in simple mixtures [28, 29] and a limited number of proteases have been assessed in more complex systems [29, 30]. There are none to our knowledge that provide appropriate data to show the general applicability of this methodology. The goal of this study was to explore methods to improve sequence coverage by examining human CSF digests from different proteases. Enzymatic digests of CSF proteins were obtained with ArgC, GluC, chymotrypsin, trypsin or combinations of enzymes, and then examined with LC/MS. The degree of sequence coverage was found to vary for different protein substrates, but is directly related to the natural abundance of the amino acids that define the particular cleavage site. Further, combining the results obtained with different proteases gives a sequence coverage that is significantly higher than that obtained with either individual data set by itself.
| METHODOLOGY |
|---|
|
|
|---|
Preparation of CSF protein digests
An aliquot of human CSF containing 2.8 mg total protein was denatured by adding solid urea; 720 mg urea was added to 2.0 ml of CSF. A 14.0 µl aliquot of dithiothreitol (DTT) solution (30 mg in 1 ml of 100 mM ammonium bicarbonate, pH 8) was added to the CSF/urea solution, mixed by gentle vortex and then incubated at room temperature for 1 h. Next, 56.0 µl of alkylating agent (36 mg of iodoacetamide in 1 ml of 100 mM ammonium bicarbonate, pH 8) was added, mixed by gentle vortex and then incubated at room temperature in the dark for 1 h. Next, excess iodoacetamide was removed by adding 56.0 µl of DTT solution followed by gentle vortex mixing and incubation at room temperature for 1 h. Excess reagents were removed and the buffer exchanged (3x) to 100 mM ammonium bicarbonate (pH 8) with spin filtration (Viva Spin 500) to a final volume of 140 µl.
Trypsin, GluC and chymotrypsin digests were prepared in a similar manner. Reduced-alkylated CSF proteins were diluted to 1114 µg/µl with 100 mM ammonium bicarbonate (pH 8). Trypsin or GluC or chymotrypsin (all Princeton Separations, Inc.) were added to give a protein:protease ratio of 50:1 by mass and incubated overnight at 37°C, typically 1416 h of incubation. The reaction was quenched by adding glacial formic acid to achieve pH 3.
Digests with GluC and ArgC were prepared under slightly different conditions. ArgC digests were prepared by diluting reduced-alkylated CSF proteins to
1.2 µg/µl with 100 mM ammonium bicarbonate (pH 8). ArgC was added to give a protein: protease ratio of 20:1 by mass and incubated overnight at 30°C, typically 1416 h of incubation. Additional GluC digests were prepared in an identical fashion for the purpose of comparing the different experimental conditions. GluCArgC digests were prepared in a similar manner where GluC digestion was initiated and allowed to proceed for 8 h. ArgC was then added (20:1 protein:enzyme) and the composite reaction allowed to continue overnight, typically 1416 h.
NanoLC-ESI ion trap MS
A modified version of the PepFinder Kit with a Finnigan Surveyor HPLC, autosampler and nanoflow solvent delivery system (Thermo Electron Corporation, San Jose, CA, USA) was used to present the CSF sample to a FinniganTM LTQTM ion trap mass spectrometer equipped with a nano-electrospray ion source (Thermo, San Jose, CA, USA) and 30 µm PicoTip emitter (New Objective, Inc.) The PepFinder Kit was modified from its original form to contain a 5 x 0.3 mm Zorbax C-18 peptide trap (Agilent Technologies) or a 0.075 x 25 mm Biobasic C18 IntegraFrit Trap (New Objective, Woburn, MA, USA) combined with a 100 µm internal diameter x 25 cm BetaBasic 18 nanobore C-18 separation column (Thermo). In a typical experiment, 6 µg of CSF protein digest was injected onto the trap, washed and then eluted onto and through the C-18 column with a pseudo-exponential gradient profile, from 0 to 80% B in
4 h in the following gradient increments; 0.1%/min for 50 min, 0.2%/min for 50 min, 0.25%/min for 40 min, 0.33%/min for 60 min, 0.44%/min for 45 min and 4%/min for 5 min (A = 0.1% formic acid, B = 0.1% formic acid in acetonitrile). The mass spectrometer was operated in a data-dependent MS/MS mode and dynamic exclusion was enabled. Gas-phase fractionation with three distinct scan ranges (450600, 650900, 9001600 m/z) was used to maximize the number of peptides identified as described by Yi et al. [31]. The number of MS/MS scans varied with the scan range (1 MS + 10 MS/MS for 450600 m/z, 1 MS + 8 MS/MS for 650900 m/z, 1 MS + 4MS/MS for 9001600 m/z).
Data Analysis
The MS/MS spectra obtained from each LC/MS analysis were searched against a SwissProt database (release 7459) using the Sequest algorithm [32] implementation of BioWorks 3.1 (Thermo, San Jose, CA, USA). In order to minimize falsely identified peptides, we analysed only the most confident spectra. This was achieved for 38 proteins that were positively identified from two independent LC/MS experiments of a single trypsin digest. Identifications were based on the criteria presented by Washburn et al. [33]. Peptides with up to two missed cleavages were allowed. Peptides with both cleavage ends unrelated to the protease employed were assumed to be false-positive identifications and were eliminated from the data sets. Each of the 38 proteins was identified by two or more different peptides in both LC/MS experiments and represents those with the highest consensus score.
We obtained sequence coverage for each type of protease experiment as follows: (i) for single-enzyme coverage, we combined all the positively identified peptides for each protein from the two consecutive LC/MS experiments, but counted any identical or shared sequences as one. The resulting, unique sequence coverage was summed per analysis, and then converted to a percentage of the biologically competent species (e.g. signal and pro-peptides not included). (ii) for dual-enzyme coverage, we combined the aforementioned single-enzyme coverage for each protease pair, again counting any identical or shared sequences as one, and calculated the percent coverage as before.
| RESULTS AND DISCUSSION |
|---|
|
|
|---|
The goal of this study was to examine the differences in sequence coverage obtained from human CSF digests produced with different proteases, either individually or in combination. From this we hope to determine the best method to obtain the greatest sequence coverage with the fewest number of experiments.
Comparison of the ion chromatograms obtained from each protease digest (Figure 1) provides insight regarding the numbers and lengths of the peptides obtained from each type of digest. The similarity of the trypsin and chymotrypsin chromatograms indicates that the average numbers and lengths of peptides produced by each enzyme are similar. The long retention times for peptides from ArgC digests indicates that they are much more hydrophobic than that produced by trypsin or chymotrypsin, suggesting the presence of fewer and much longer peptides. GluC produces peptides with the widest range of retention times, and thus the most diverse array of peptide lengths. The combined ArgCGluC digest chromatogram resembles that of the GluC, as would be expected. The sequence coverage obtained for CSF proteins using GluC under both sets of experimental conditions were the same within the experimental error, indicating that the different experimental conditions do not affect the number of peptides produced for the CSF proteins. For this reason, we feel that all ArgC and combined GluC and ArgC data should be directly comparable to all other results.
|
The sequence coverage obtained from each LC/MS run for each of the 38 selected proteins is given in Table 1. The sequence coverage for any individual protein substrate is quite consistent across LC/MS runs for a particular protease with respect to total sequence coverage, but otherwise the percent coverages vary widely in a protease-protein substrate-dependent manner. Combined in silico digests (Table 2) show a similar trend. Insight into the variability of the percent sequence coverage with respect to the abundance levels of different protein substrates and the same protease may be gleaned from the data presented in Table 3 where the average observed coverage for the selected CSF proteins are listed with their known in vivo ranges of concentration. The fact that all proteases and combinations examined produced peptides for proteins that differ in concentration by nearly four orders of magnitude and that the resulting coverages vary comparatively little over this range indicates that the availability of cleavable sites rather than the enzyme efficiency limits amino acid sequence coverage.
|
|
|
The data presented in Table 4 provides the global averages of the data presented in Tables 1 and 2. Clearly there is a wide range of sequence coverage produced by the different proteases where, in terms of single enzyme sequence coverage:
|
|
|
With few exceptions, the combined GluCArgC digests produce coverage between that obtained with GluC and ArgC alone. We hypothesized that ArgC would increase the sequence coverage by liberating peptides from regions not accessible to GluC. However, the primary effect of added ArgC was to reduce the average size of GluC-produced peptides and consequently the efficiency of capture on the peptide trap and subsequent MS detection. This suggests that the use of proteases with greater natural abundance of cleavable sites than trypsin or GluC (e.g. pepsin) will result in lower coverage than observed for either GluC or trypsin digests of CSF proteins.
The global averages given in Table 4 shows that some combinations are better than others where:
|
|
The in silico combination of GluC and trypsin unshared sequence data increases the trypsin sequence coverage from 28.6 to 44.3%, a 55% improvement in sequence coverage. Similarly, the in silico combination of chymotrypsin and trypsin unshared sequence data, and ArgC and GluC unshared sequence data improves the trypsin and GluC coverage by 13 and 21%, respectively.
We examined how the increased coverage after combining trypsin- and GluC-derived peptides would affect our ability to study CSF proteins. Figure 2 shows the regions of peptide fragments from 7 proteins that were surveyed. Complementing the data in Table 4, Figure 2 shows that this combined data allows considerably greater coverage for altered amino acids and post-translational modification sites, as compared with the either enzyme alone. For example, combining the unshared sequence data for a GluC and trypsin digest of apolipoprotein D (Figure 2) will increase the sequence coverage to 74.6% (Table 4) from 29.0 and 60.4% provided by trypsin and GluC, respectively. In addition, the application of the two enzymatic digestions allows characterization of additional sequences that contain phosphorylation sites.
|
| CONCLUSIONS |
|---|
|
|
|---|
The success of bottom-up proteomics of the CSF is clearly a function of the proteases employed. Although there is a significant variability in the sequence coverage obtained for different CSF protein substrates, it is clear that trypsin or GluC equally produce the largest sequence coverage for a single protease. Further, combining data from two individual digests produced by different proteases significantly enhances the overall sequence coverage for most CSF proteins. The greatest coverage is achieved by combining trypsin and GluC digests. Such improved sequence coverage will aid in protein identifications, including the study of specific alterations in protein sequence and post-translational modifications.
Key Points
|
| FOOTNOTES |
|---|
Dr Roger G. Biringer is a Senior Scientist in the Scientific Instruments Division of Thermo Electron Corporation. His research interests include proteomics of neurological disorders and the development of mass spectrometric methodologies to solve biological problems.
Heidi Amato received her Master's Degree in Chemistry from San Jose State University, San Jose, CA. She is currently an entrepreneur in Oregon.
Michael G. Harrington is the Clan Chief of the Molecular Neurology Program at Huntington Medical Research Institutes where his main focus is to elucidate the biochemistry of migraine and neurodegenerative diseases.
Alfred N. Fonteh is a Principal Investigator/Science Director of the Molecular Neurology Program at the Huntington Medical Research Institutes. His major interest is in understanding the biochemical mechanisms of neurological diseases.
James N. Riggins is a scientist in the Molecular Neurology Program and in the Liver Program at Huntington Medical Research Institutes.
Andreas F.R. Hühmer is the Program Manager for the Scientific Instruments Division at Thermo Electron in San Jose, California. His research interests include the development of mass spectrometry-based techniques and software tools to address scientific problems in Biology and Medicine.
| References |
|---|
|
|
|---|
- Fishman RA. Cerebrospinal fluid in diseases of the nervous system. Philadelphia: W.B. Saunders Company 1980.
- Tourtellotte WW, Staugaitis SM, Walsh MJ, et al. The basis of intra-blood-brain-barrier IgG synthesis. Ann Neurol 1985; 17:217.[Medline]
- Wiltfang J, Lewczuk P, Riederer P, et al. Consensus paper of the WFSBP Task Force on Biological Markers of Dementia: the role of CSF and blood analysis in the early and differential diagnosis of dementia. World J Biol Psychiatry 2005; 6:6984.[Medline]
- Harrington MG, Merril CR. Two-dimensional electrophoresis and "ultrasensitive" silver staining of cerebrospinal fluid proteins in neurological diseases. Clin Chem 1984; 30:19337.[Abstract]
- Harrington MG, Merril CR, Torrey EF. Differences in cerebrospinal fluid proteins between patients with schizophrenia and normal persons. Clin Chem 1985; 31:7226.[Abstract]
- Harrington MG, Merril CR, Asher DM, Gajdusek DC. Abnormal proteins in the cerebrospinal fluid of patients with CreutzfeldtJakob disease. N Engl J Med 1986; 315:27983.[Abstract]
- Hsich G, Kenney K, Gibbs CJ, et al. The 14-3-3 brain protein in cerebrospinal fluid as a marker for transmissible spongiform encephalopathies. N Engl J Med 1996; 335:92430.
[Abstract/Free Full Text] - Knopman DS, DeKosky ST, Cummings JL, et al. Practice parameter: diagnosis of dementia (an evidence-based review). Report of the Quality Standards Subcommittee of the American Academy of Neurology. Neurology 2001; 56:114353.
[Abstract/Free Full Text] - Huhmer AF, Biringer RG, Amato H, et al. Protein analysis in human cerebrospinal fluid: physiological aspects, current progress and future challenges. Dis Markers 2006; 22:21134.
- Biringer RG, Hao Z, Harrington M, Hühmer AFR. American Society for Mass Spectrometry Annual MeetingCombined top-down, middle-down and bottom-up analysis of apo A1 and PTGDS isolated from human CSF using ETD-linear ion trap and FT-mass spectrometry TX: Edition San Antonio 2005.
- Yun M, Wu W, Hood L, Harrington M. Human cerebrospinal fluid protein database: edition 1992. Electrophoresis 1992; 13:100213.[CrossRef][ISI][Medline]
- Davidsson P, Sjogren M. The use of proteomics in biomarker discovery in neurodegenerative diseases. Dis Markers 2005; 21:8192.[ISI][Medline]
- Biringer RG, Harrington MG, Stochaj W, Bondarenko P, Huhmer A, Amato H, Chu G, Swedberg S. Identification of proteins from human cerebrospinal fluid by two-dimensional electrophoresis and ion-trap mass spectrometry: improved sample preparation and presentation methods. Abstract P112-S. At:http://abrf.org/Other/ABRFMeetings/ABRF2002/2002Abstracts.html. In Edition 2002.
- Dumont D, Noben JP, Raus J, et al. Proteomic analysis of cerebrospinal fluid from multiple sclerosis patients. Proteomics 2004; 4:211724.[CrossRef][ISI][Medline]
- Hakansson K, Emmett MR, Marshall AG, et al. Structural analysis of 2D-gel-separated glycoproteins from human cerebrospinal fluid by tandem high-resolution mass spectrometry. J Proteome Res 2003; 2:5818.[CrossRef][ISI][Medline]
- Hammack BN, Fung KY, Hunsucker SW, et al. Proteomic analysis of multiple sclerosis cerebrospinal fluid. Mult Scler 2004; 10:24560.
[Abstract/Free Full Text] - Akerman S, Goadsby PJ. Topiramate inhibits trigeminovascular activation: an intravital microscopy study. Br J Pharmacol 2005; 146:714.[CrossRef][ISI][Medline]
- Ramstrom J, Hagman C, Mitchell JK, et al. Depletion of high-abundant proteins in body fluids prior to liquid chromatography fourier transform ion cyclotron resonance mass spectrometry. J Proteome Res 2005; 4:4106.[CrossRef][ISI][Medline]
- Raymackers J, Daniels A, De Brabandere V, et al. Identification of two-dimensionally separated human cerebrospinal fluid proteins by N-terminal sequencing, matrix-assisted laser desorption/ionizationmass spectrometry, nanoliquid chromatography-electrospray ionization-time of flight-mass spectrometry, and tandem mass spectrometry. Electrophoresis 2000; 21:226683.[CrossRef][ISI][Medline]
- Wenner BR, Lovell MA, Lynn BC. Proteomic analysis of human ventricular cerebrospinal fluid from neurologically normal, elderly subjects using two-dimensional LC-MS/MS. J Proteome Res 2004; 3:97103.[CrossRef][ISI][Medline]
- Yuan X, Desiderio DM. Proteomics analysis of phosphotyrosyl-proteins in human lumbar cerebrospinal fluid. J Proteome Res 2003; 2:47687.[CrossRef][ISI][Medline]
- Yuan X, Desiderio DM. Proteomics analysis of prefractionated human lumbar cerebrospinal fluid. Proteomics 2005; 5:54150.[CrossRef][ISI][Medline]
- Zhang J, Goodlett DR, Peskind ER, et al. Quantitative proteomic analysis of age-related changes in human cerebrospinal fluid. Neurobiol Aging 2005; 26:20727.[CrossRef][ISI][Medline]
- Zheng PP, Luider TM, Pieters R, et al. Identification of tumor-related proteins by proteomic analysis of cerebrospinal fluid from patients with primary brain tumors. J Neuropathol Exp Neurol 2003; 62:85562.[ISI][Medline]
- Harrington MG, Biringer RF, Huhmer AF, Fonteh AN. Cerebrospinal fluid protein composition. J Neurochem 2005; 94:17.
- Missler M, Sudhof TC. Neurexins: three genes and 1001 products. Trends Genet 1998; 14:206.[CrossRef][ISI][Medline]
- Harrington MG, Fonteh AN, Biringer RG, et al. Prostaglandin D synthase isoforms from cerebrospinal fluid vary with brain pathology. Dis Markers 2006; 22:7381.[Medline]
- Aebersold RH, Leavitt J, Saavedra RA, et al. Internal amino acid sequence analysis of proteins separated by one- or two-dimensional gel electrophoresis after in situ protease digestion on nitrocellulose. Proc Natl Acad Sci USA 1987; 84:69704.
[Abstract/Free Full Text] - Choudhary G, Wu SL, Shieh P, Hancock WS. Multiple enzymatic digestion for enhanced sequence coverage of proteins in complex proteomic mixtures using capillary LC with ion trap MS/MS. J Proteome Res 2003; 2:5967.[CrossRef][ISI][Medline]
- MacCoss MJ, McDonald WH, Saraf A, et al. Shotgun identification of protein modifications from protein complexes and lens tissue. Proc Natl Acad Sci USA 2002; 99:79005.
[Abstract/Free Full Text] - Yi EC, Marelli M, Lee H, Purvine SO, Aebersold R, Aiychison JD, Goodlett DR. Approaching complete peroxisome characterization by gas-phase fractionation. Electrophoresis 2002; 18:320516.
- Eng JK. An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database. J Am Soc Mass Spectrom 1994; 5:97689.[CrossRef][ISI]
- Washburn MP, Wolters D, Yates JR 3rd. Large-scale analysis of the yeast proteome by multidimensional protein identification technology. Nat Biotechnol 2001; 19:2427.[CrossRef][ISI][Medline]
This article has been cited by other articles:
![]() |
D. Kultz, D. Fiol, N. Valkova, S. Gomez-Jimenez, S. Y. Chan, and J. Lee Functional genomics and proteomics of the cellular osmotic stress response in `non-model' organisms J. Exp. Biol., May 1, 2007; 210(9): 1593 - 1601. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||


