Skip Navigation



Briefings in Functional Genomics and Proteomics Advance Access published online on October 23, 2007

Briefings in Functional Genomics and Proteomics, doi:10.1093/bfgp/elm020
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow All Versions of this Article:
6/3/220    most recent
elm020v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Hühne, R.
Right arrow Articles by Sühnel, J.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Hühne, R.
Right arrow Articles by Sühnel, J.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© Oxford University Press, 2007, All rights reserved. For permissions, please email: journals.permissions@oxfordjournals.org

A comparative view at comprehensive information resources on three-dimensional structures of biological macro-molecules

Rolf Hühne, Frank-Thomas Koch and Jürgen Sühnel

Corresponding author. Jürgen Sühnel, Biocomputing Group, Leibniz Institute for Age Research, Fritz Lipmann Institute (FLI), Jena Centre for Bioinformatics, Beutenbergstr 11, D-07745 Jena/Germany. Tel: +49-3641-656200; Fax: +49-3641-656210; E-mail: jsuehnel{at}fli-leibniz.de.


    ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 COMPREHENSIVE 3D STRUCTURE...
 COMPARING THE RESOURCES
 CONCLUSIONS
 ACKNOWLEDGEMENTS
 REFERENCES
 
The rapidly increasing amount of information on three-dimensional (3D) structures of biological macro-molecules has still an insufficient impact on genome analysis, functional genomics and proteomics as well as on many other fields in biomedicine including disease-related research. There are, however, attempts to make structural data more easily accessible to the bench biologist. As members of the world-wide Protein Data Bank (wwPDB), the RCSB Protein Data Bank (PDB), the Protein Data Bank Japan and the Macromolecular Structure Database are the primary information resources for 3D structures of proteins, nucleic acids, carbohydrates and complexes thereof. In addition, a number of secondary resources have been set up that also provide information on all currently known structures in a relatively comprehensive manner and not focusing on specific features only. They include PDBsum, the OCA browser-database for protein structure/function, the Molecular Modeling Database and the Jena Library of Biological Macromolecules—JenaLib. Both the primary and secondary resources often merge the information in the PDB files with data from other resources and offer additional analysis tools thereby adding value to the original PDB data. Here, we briefly describe these resources from a user's point of view and from a comparative perspective. It is our aim to guide researchers outside the structure biology field in getting the most out of the 3D structure resources.

Keywords: three-dimensional structure, proteins, nucleic acids, X-ray crystallography, NMR spectroscopy


    INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 COMPREHENSIVE 3D STRUCTURE...
 COMPARING THE RESOURCES
 CONCLUSIONS
 ACKNOWLEDGEMENTS
 REFERENCES
 
The last 20 years have seen a dramatic increase in the number of known experimental structures of proteins and nucleic acids and other molecules of life. For example, in 1986 the total number of entries in the Protein Data Bank (PDB) was 214 and this number has increased to 44 700 structures as of 17 July 2007 [1,2]. This huge amount of structural information is not yet fully utilized in functional genomics and proteomics. The relatively small impact of structural information outside the structural biology community has been nicely described in a joke by NCBI's Steve Bryant mentioned in an Editorial in Nature Structural Biology in 1997 [3]: What do molecular biologists fear most? Three letters: PDB.

The RCSB PDB [1,2] and more recently the world-wide Protein Data Bank (wwPDB) [4] are the single world-wide resources for three-dimensional (3D) structural information on biological macro-molecules. In wwPDB (www.wwpdb.org) organizations that act as deposition, data processing and distribution centres for PDB data have joined forces. The founding members are RCSB PDB (USA), Macromolecular Structure Database (MSD-EBI, Europe) and Protein Data Bank (PDBj, Japan). The Biological Magnetic Resonance Bank (BMRB, USA) [5] group joined the wwPDB in 2006.

The PDB has been established in 1971 and thus belongs to the oldest biological databases [1,2]. Originally, it was a simple file directory with ASCII files in the so-called PDB format. With the advent of the World-Wide Web this situation has dramatically changed and much more user-friendly resources have been established.

The first steps towards more user friendliness were done by simple image collections provided by SWISS-3DIMAGE [6] (www.expasy.ch/sw3d/) and by the IMB Jena Image Library of Biological Macromolecules [7] (www.imb-jena.de/IMAGE.html) as well as by browsers provided at that time by the PDB [6,8,9]. SWISS-3DIMAGE has not been further developed but the images are still available via the UniProt database [10]. The PDB has undertaken significant efforts to improve the user-friendliness of data access [2] and also the IMB Jena Image Library has been completely redesigned and extended since 1993. In 2005, due to a name change of the hosting institute it has changed its name to Jena Library of Biological Macromolecules—JenaLib (www.fli-leibniz.de/IMAGE.html). Later, PDBsum (www.ebi.ac.uk/pdbsum/) [11], OCA (ispc.weizmann.ac.il/oca-docs/oca-home.html), the PDBj (www.pdbj.org) [12] and the MSD (www.ebi.ac.uk/msd/) [13] were set up.

In addition to the comprehensive resources mentioned, there are other 3D structure databases that either offer information on a subset of the currently known structures of biological macro-molecules, such as the Nucleic Acid Database (NDB) [14], RNABase [15], the BMRB [5] and the Electron Density Server—EDS [16] or provide information on relatively specific features such as the PDBREPORT database [17] or the STING server [18]. The description of these latter databases is beyond the scope of this article.

There is a substantial overlap between the data and services offered by the comprehensive resources. However, many of them do also provide unique information that is not available in the others. Information on the services as well on the specific strengths and weak points is not easy to get for a potential user. The features of most of the comprehensive resources have been reported either in the database issue of Nucleic Acids Research or elsewhere. Thus far, there is, however, only one attempt of a comparative description dating back to 2002 [19]. Recently, some of the 3D structure databases have been redesigned and significantly extended. Therefore, we think it is a good time for an updated review adopting a comparative perspective and focusing on information that is unique to a particular resource.


    COMPREHENSIVE 3D STRUCTURE DATABASES
 TOP
 ABSTRACT
 INTRODUCTION
 COMPREHENSIVE 3D STRUCTURE...
 COMPARING THE RESOURCES
 CONCLUSIONS
 ACKNOWLEDGEMENTS
 REFERENCES
 
The description is started with the wwPDB databases followed by the other resources. The databases are listed in reverse alphabetical order. For each of the databases there is a figure showing either the full or a part of the atlas page. This provides an at-a-glance view of many features. For a few resources additional figures are displayed. In addition, in Table 1, database characteristics are listed and compared. Table 2 offers information on a subset of these data focusing on unique features. Recent developments in the JenaLib database were not yet described elsewhere. Therefore, its features are described in more detail as compared to the other resources. Finally, it should be noted that the description of individual databases cannot be comprehensive. We are here primarily concerned with atlas pages offered by all resources but also mention occasionally other analysis tools.


View this table:
[in this window]
[in a new window]

 
Table 1: Features available in the PDB, PDBj, MSD, PDBsum, OCA, MMDB and JenaLib databases

 

View this table:
[in this window]
[in a new window]

 
Table 2: Selected features of 3D structure databases of biological macromolecules with an emphasis on uniqueness

 
The RCSB Protein Data Bank (PDB)
The main PDB archive consists of structures determined using experimental methods only. Models, accepted prior to 15 October 2006, are stored in a separate archive. The RCSB PDB offers structure information on five different pages titled Structure Summary, Biology & Chemistry, Materials & Methods, Sequence Details and Geometry (Figure 1). In addition to the PDB data, the Structure Summary page provides also SCOP [20], CATH [21] and PFAM [22] information. Given, there is a difference between asymmetric and biological units the user can switch between both representations. For nucleic acid containing structures the thumbnail image is taken from the NDB. Interactive visualization is possible via KiNG, Jmol (www.jmol.org), WebMol [23], the MBT Protein Workshop, the MBT (Molecular Biology Toolkit) Ligand Explorer, QuickPDB, RasMol [24] and the Swiss-PDB Viewer [25]. There is also an option for creating high-resolution images. The Chemistry & Biology page offers detailed information on the molecule under study. This includes Gene Ontology [26] and PubMed MESH terms as well as information on associated pathways and catalytic sites. On the Materials & Methods page specific details of the structure determination method, in particular, X-ray or NMR data, are displayed. The Sequence Details page offers an informative view of the sequence including information on secondary structure, disulphide bonds and SCOP domains (Figure 2). Finally, the Geometry page displays data on bond lengths, bond angles and dihedral angles. The atlas page offers also a number of external links and enables the user to display or download PDB files in ASCII, XML and mmCIF format. Navigation through the pages requires reloading.


Figure 1
View larger version (69K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Figure 1: RCSB PDB atlas page for lysozyme (5 lyz).

 

Figure 2
View larger version (42K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Figure 2: RCSB PDB Sequence details page for a thrombin/hirudin IIIB complex (1z71). Only chain A is shown.

 
A unique feature of the RCSB PDB is the Structural Genomics Information Portal with information on worldwide Structural Genomics initiatives, the TargetDB and PepcDB databases that offer information on the progress of structure determination and also a list of functional annotation sites.

The advanced search option enables the building of really complex queries based on the evaluation of sub-queries (Figure 3). It is an extremely powerful search tool.


Figure 3
View larger version (54K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Figure 3: RCSB PDB advanced search option. The example shows the combination of three subqueries. On the left all query types are shown.

 
Last but not least, educational information and site tutorials are also available. This includes, for example, the very well designed Molecule-of-the-Month series.

The Protein Data Bank Japan (PDBj)
The PDBj was set up in 2000 [12]. For experimental structures it provides information on atlas pages, whereas for theoretical models only links to the PDB files are available. This is in line with the current PDB policy. There is a summary page and in addition pages on structural details, experimental details, function details, sequence neighbour and finally download/display page and links pages (Figure 4). To a large extent the information shown on these pages is taken from the PDB file. However, there are also additional data. The functional details page offers also functional information from Gene Ontology data [26] and from UniProt [10] as well as from PROSITE [27]. Moreover, catalytic information from the databases Catalytic Site Atlas (CSA) [28] and Catalytic Residue Dataset (CATRES) [29] is offered. The sequence neighbour page lists all PDB entries with the same UniProt sequence (exact matches) and also other structures with related sequences. The download/display page contains links for downloading the PDB files in ASCII, XML and mmCIF format as well as structure factors if available. On the summary page, there are mono representations of the structure in three different orientations and two different sizes available. Also, there are visualization options for the electron density map and for the structure with two different Java viewers. They need the Java(TM)Plug-in 1.4 and Java3D 1.3 or JOGL library 1.0.


Figure 4
View larger version (57K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Figure 4: PDBj atlas page for lysozyme (5 lyz).

 
PDBj has a QuickSearch and more sophisticated search options both based on PDB XML files. In QuickSearch, a search is possible for the PDB ID and for any other text string that obviously not only scans the KEYWD record but performs the search in the complete header of the PDB file. In addition to a more or less standard Advanced Search, there are also more sophisticated tools called XQuery/XPath Search.

The Macromolecular Structure Database (MSD)
The MSD at the EBI offers information on experimental and theoretical structures and also provides atlas pages for superseded entries [13]. The information for a particular entry is displayed on five different pages titled Summary, Assembly, Sequence, Citation, Similarity and Visualization (Figure 5). The advantage of this data organization is that the information is clearly arranged. On the other hand, it is a disadvantage that the server has to be contacted to (re)load a page when navigating through the pages, thereby slowing down the work. Assembly information is not taken from the PDB as in the other wwPDB resources but from the Protein Quaternary Structure Server PQS [30].


Figure 5
View larger version (39K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Figure 5: MSD atlas page for lysozyme (5 lyz).

 
The sequence page offers a UniProt/PDB alignment (Figure 6). The citation page provides a complete list of primary and secondary citations of a particular entry as well as of references for related entries. On the similarity page, an especially useful feature is the structural comparison to all other PDB structures via the MSD fold system. Finally, the visualization page offers the Astex-Viewer, RasMol and a JenaLib link as visualization options. MSD offers an impressive number of additional analysis tools called MSDlite, MSDpro, MSDmotif, MSDtemplate, MSDpisa, MSDchem, MSDmine, MSDsite, MSDfold, MSDtarget and MSDanalysis. To mention only a few: both MSDlite and MSDpro represent excellent search interfaces, MSDsite is a database search and retrieval system for the analysis of ligands and active sites [31], MSDmotif tries to identify small motifs ocurring in 3D structures and MSDfold is a structure matching tool.


Figure 6
View larger version (69K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Figure 6: MSD UniProt/PDB alignment view for a thrombin/hirudin IIIB complex (1z71).

 
PDBsum
The PDBsum database was created at the University College of London in 1995 and has now moved to the European Bioinformatics Institute [32]. Database entries can be accessed either by the PDB ID or a text search in the TITLE, HEADER, COMPND, SOURCE and AUTHOR records of the PDB files or also by sequence.

The database can also be browsed by a number of lists. They include PDB codes, ligands, the Enzyme Classification (E.C.) scheme, PROSITE patterns, species and a highlights page with various categories of entries such as the oldest, newest, largest and smallest ones, for example. Directly from the atlas page a PROCHECK job can be started [33].

On top of the atlas page, both asymmetric and biological unit thumbnails are shown for crystallographic structures (Figure 7). Interactive visualization is possible via Jmol (www.jmol.org), RasMol [24] and the Astex Viewer. There is also an option with RasMol for structure orientation and Molscript [34] and Raster3D [35] for the generation of high-quality images. Finally, own structures can be uploaded and visualized with the PDBsum tools. This latter option is not available in other resources.


Figure 7
View larger version (57K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Figure 7: PDBsum atlas page for lysozyme (5 lyz).

 
Compact information on chains, ligands and solvent molecules is given on the atlas page. PDBsum offers a very nice view of sequences with information on secondary structure, residue conservation and residue contacts to the nucleic acid part in the case of protein–nucleic acid complexes (Figure 8). There is also a topology view for proteins and a NUCPLOT [36] view for nucleic acid parts interacting with a protein. Finally, surface representation is available and a cleft analysis is also provided.


Figure 8
View larger version (48K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Figure 8: PDBsum sequence and secondary structure information for a thrombin/hirudin IIIB complex (1z71). Only chain A is shown.

 
A very interesting and unique PDBsum feature is the availability of key figures extracted from the literature [37]. In many cases, author generated images have much higher information content than other ones. So, this is really an extremely useful and informative feature.

The OCA database
The OCA database for protein structure and function is hosted by the Weizmann Institute of Science. OCA offers both a PDBlite search and an advanced search option. In the advanced search gene and disease names can, for example, be used and there is also a possibility for a sequence search. An advantage of the search functionality of OCA is the usage of a thesaurus and an automatic spelling verification in the query mechanism. Database entries can be accessed either by the PDB ID or a text search in the TITLE, HEADER, COMPND, SOURCE and AUTHOR records of the PDB files or also by sequence.

The structure information for an entry is contained in a single page (Figure 9). There is a detailed compound description. The remaining part is divided into a number of subsections titled Data retrieval, View in 3D, Visual 3D analysis, Structure-derived information, Sequence-derived information and Movements, Other resources and Movements, Movies and Images. 3D structure visualization is possible via Jmol (www.jmol.org), AstexViewer and RasMol [24]. Both CATH [21] and SCOP [20] information is part of the atlas page. Information on SCOP domains can also be visualized in the 3D structure by Jmol and RasMol. A particular OCA strength is the relatively detailed disease information.


Figure 9
View larger version (58K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Figure 9: OCA atlas page for lysozyme (5 lyz).

 
OCA is open to be used for automatic access, returning XML and plain text results that allow easy integration into other software, web servers or batch queries from large sequencing or proteomics centres. A list of reporting formats and tags is available from bip.weizmann.ac.il/oca-docs/faq.html.

The Molecular Modeling Database (MMDB)
NCBI's structure database is called MMDB and it offers information on all experimentally determined 3D structures of biological macro-molecules [38]. Visualization is possible via the Cn3D viewer (www.ncbi.nlm.nih.gov.ilsprod.lib.neu.edu/Structure/CN3D/cn3d.shtml) and RasMol [24]. As compared to the other databases, the information content of the atlas pages is rather small (Figure 10). A very useful feature is, however, information on structural neighbours derived from the VAST algorithm [39]. Also, due to the usage of the PDBeast system (130.14.29.110/Structure/PDBEAST/pdbeast.shtml) species information is especially reliable and avoids the inconsistencies of the original PDB data.


Figure 10
View larger version (40K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Figure 10: MMDB atlas page for lysozyme (5 lyz).

 
The Jena Library of Biological Macromolecules (JenaLib)
The JenaLib database was set up in 1993, originally as a gopher-accessible image archive under the name IMB Jena Image Library of Biological Macromolecules [40]. In 1998, a data pipeline was established that combined automatic and manual processing. It generated HTML atlas pages for all database entries. The database consists of two major sections, the atlas of macro-molecule structures and basic information on the architecture of biopolymers. The latter includes, for example, subsections on experimental methods for structure determination and on nucleic acid nomenclature and structure as well as an amino acid repository.

The JenaLib database provides atlas pages for all entries from the PDB and NDB databases. So, the small number of structures available from the NDB only can also be accessed via the JenaLib database. Both experimental and theoretical structures are included. When searching for superseded entries the user is automatically forwarded to the new structure.

In an attempt of improved data integration a database has been built that contains additional data besides the data from the original PDB files, e.g.: Gene Ontology information [26], structural data from the SCOP [20] and CATH [21] classification schemes, sequence data from the PROSITE database [27], single-amino acid polymorphism from the UniProt variant pages and also information from the GenAge database [41]. The basis for mapping sequence data on structures is an automatic sequence alignment between UniProt and PDB sequences. The PDB sequence is taken directly from the coordinates and not from the SEQRES record. The alignment is displayed in an alignment viewer. It highlights mismatches, gaps, modified residues and numbering irregularities in PDB files. The JenaLib alignment can be directly compared to the data shown on the MSD sequence page (Figure 11).


Figure 11
View larger version (25K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Figure 11: JenaLib UniProt/PDB alignment view for a thrombin/hirudin IIIB complex (1z71).

 
The complete information for a particular structure is shown on one page. There is, however, an expand/collapse mechanism that initially hides the content of most sections (Figure 12) but also allows quick access to the complete information without any additional time-consuming server contacts and tedious clicking through many pages.


Figure 12
View larger version (49K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Figure 12: JenaLib atlas page for lysozyme (5 lyz).

 
Visualization was and is one of the strengths of the JenaLib database. Therefore, it offers a large number of manually generated images. Some of them have been used in newspapers, exhibitions, books and as journal cover images. For example, the journal RNA uses for all issues since 1993 JenaLib molecule representations. The database offers also for all structures automatically generated mono and stereo MolScript images [34] in PDF format. A Virtual Reality Modeling Language viewer can be used for modifying the default orientation. Interactive visualization is possible via RasMol [24] and Chime as well as with the Java-based WebMol [23] and Astex Viewers (www.astex-therapeutics.com/AstexViewer/index.php).

There is also a unique JenaLib viewer based on the platform-independent open-source viewer Jmol (www.jmol.org). The JenaLib Jmol viewer offers a great deal of selection and rendering options by a simple point-and-click mechanism. In addition, there is also a command-line interface to provide access to the Jmol scripting language for advanced users (Figure 13). The viewer gets data from the underlying database mentioned earlier that includes a great deal of information beyond the PDB data. This makes the JenaLib Jmol viewer a unique visualization tool with many options not available in other 3D structure resources. For example, there are a number of standard views that highlight hetero components and sites, PROSITE motifs, SAPs (single amino acid polymorphisms) and the SCOP or CATH domain structure. A unique JenaLib feature is also the rendering of both the CATH and SCOP domain structure within one view. This domain structure can be shown for both asymmetric and biological units.


Figure 13
View larger version (59K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Figure 13: Alkaline protease (1akl) view created with the JenaLib Jmol viewer.Thick ribbon: PROSITE ZINC_PROTEASE motif; large ball: water oxygen atom from the active site (CAT); small ball: Zn; thin sticks: active site amino acids His176, His 180, Tyr216; thick sticks: mutation G167A (wild type _ dark, mutant _ bright).

 
The standard views can be modified and combined by either a basic or an advanced interface. In the basic interface, any change in the selection of a pull down menu automatically triggers an action (one-step mechanism). View and selection are coupled in the structure-specific controls. On the other hand, in the advanced interface any change in the selection of a pull down menu only sets the target for the corresponding control buttons (two-step mechanism). View and selection are set independently in the structure-specific controls. There is also an option for exploring individual NMR structure models. The information on biological units is taken from the PDB and not from the PQS server [30].

A further interesting JenaLib option is a content-highlighting system of PDB files which makes the navigation through these files much easier.

The JenaLib offers a number of pre-computed entry lists but also includes an option for the generation of fully customizable lists (Figure 14). Entry sets can be selected according to database (PDB and/or NDB), method (X-Ray/neutron/synchrotron, NMR, electron, other experimental and theoretical model), molecule type (protein, DNA, etc.) or other features such as the occurrence of modified residues, ligands/ions, SAPs and PROSITE motifs as well as availability of information from the SCOP, CATH, OMIM [42] or GenAge databases. The user can select the desired output columns (currently from a list of 26 columns) and define the sort order of the output according to information in any of the columns. The output format is either HTML or ASCII (tab-separated text). Such entry lists can also be generated from QuickSearch results.


Figure 14
View larger version (28K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Figure 14: JenaLib entry list customization.

 
QuickSearch is a user-friendly Google-like search option with only one input field. Nevertheless, due to an automatic recognition mechanism, this option allows to search for PDB and NDB IDs, UniProt IDs and accession numbers, PROSITE IDs and accession numbers as well as for other search strings. The search space comprises the database codes mentioned and the HEADER, STRUCTURE, TITLE, KEYWD, EXPDATA, HETNAME and HET, JRNL, COMPND and SOURCE records of the PDB file. The search is also performed in all sub-records such as auth, titl, ref, refn for JRNL or organism_scientific, organism_common, cellular_location, expression_system, cell_line, tissue for the SOURCE record.

In the hit list, the occurrence of the search terms is indicated with different colours for different terms. This makes the search results really transparent. There is also an advanced search option allowing searches in specific database sections. This option needs to be further improved, however, in order to enable complex queries.

The JenaLib atlas pages contain a large number of linked cross references to a total of 50 external databases and analysis tools with information on a particular structure and using the PDB/NDB ID, UniProt ID/accession number, Enzyme number, OMIM disease ID or GenAge ID for linking. The linked analysis tools include, for example, protein disorder predictors and the PDB cartoon tool.


    COMPARING THE RESOURCES
 TOP
 ABSTRACT
 INTRODUCTION
 COMPREHENSIVE 3D STRUCTURE...
 COMPARING THE RESOURCES
 CONCLUSIONS
 ACKNOWLEDGEMENTS
 REFERENCES
 
It is difficult for a potential user to identify all the either overlapping or unique features of the 3D structure databases. We have, therefore, compiled Table 1 with compact information on features available in 3D structure databases. An additional Table 2 offers information on a subset of these data with an emphasis on uniqueness.

In addition, a few remarks are to be added. Basically, all resources should have the very same information from the PDB files. This is, however, not completely true because there are currently three different PDB formats, the original ASCII file, XML and mmCIF files. Unfortunately, the data content of these files are not identical. One difference refers to the chain identifiers. In the original PDB files occasionally chain identifiers are not indicated, see for example the lysozyme structure with the PDB ID 5lyz. On the other hand, in the XML and mmCIF all structures have chain identifiers. In the case mentioned this is A. So, the chain information displayed on the atlas page depends on the file type used. In the lysozyme case chain identifier A is given by MSD and OCA, whereas for all other resources no chain identifier is indicated.

Fortunately, there is an announcement of switching to remediated PDB data after July 2007 and by this attempt hopefully the differences currently observed will vanish.

Chain identifiers may also change when passing from asymmetric to biological units. The structure databases use information on biological units from two sources, either directly from the RCSB PDB or from the PQS server. It is important to realize that these sources may provide different results. For example, in the lysozyme case the predicted biological unit is dimeric according to the PQS server, but monomeric according to the PDB.

Search results also depend critically on search space and procedure adopted by the different databases. Extreme differences can be seen, for example, if one tries to identify the number of ‘to be published’ entries. A search performed on 20 July returned the following results: PDBj Keyword: 7848 entries, RCSB PDB: 8151 entries, MSDlite Text Search: 673 entries, JenaLib QuickSearch: 9877 entries.

Because of these potential differences it is always a good idea to try to get a specific information from different databases. Despite these problems the overwhelming majority of 3D structural data is identical for the resources described.


    CONCLUSIONS
 TOP
 ABSTRACT
 INTRODUCTION
 COMPREHENSIVE 3D STRUCTURE...
 COMPARING THE RESOURCES
 CONCLUSIONS
 ACKNOWLEDGEMENTS
 REFERENCES
 
Information on 3D structures of biological macro-molecules is available from the primary resource, the RCSB PDB and from a number of secondary resources. In recent years, strong efforts have been undertaken by these resources towards extended data integration, better data uniformity and improved analysis tools with an emphasis on user-friendliness. Even though there is a substantial overlap between these databases, all of them have their unique features. Taken together, the 3D structural resources represent a rich and easily accessible information source that can be used both by the structural biologist and by the bench biologist. Challenges for the future are the development of improved navigation tools through the increasing amount of structural data and data integration ranging from the genomic level over protein sequences and structures and pathways to disease information.


Key Points

  • The primary information resource, the RCSB Protein Data Bank as well as secondary resources represent a rich archive of 3D structural information.
  • The rapidly increasing information on 3D structures of biological macro-molecules is not yet fully utilized in functional genomics and proteomics.
  • Most of the user interfaces of 3D structure database have now been designed in a really user-friendly manner and can easily be used by the bench biologist.
  • This should lead to a stronger impact of 3D structural information of proteins and nucleic acids in the field of functional genomics and proteomics.

 


    ACKNOWLEDGEMENTS
 TOP
 ABSTRACT
 INTRODUCTION
 COMPREHENSIVE 3D STRUCTURE...
 COMPARING THE RESOURCES
 CONCLUSIONS
 ACKNOWLEDGEMENTS
 REFERENCES
 
The article was funded by Nationales Genomforschungsnetz NGFN-2: Grant 01 GR 0457 to J.S. The authors are grateful to Kristina Mehliss who has helped with thumbnail image generation, to Friedrich Haubensak for maintaining the web server and to Stefan Westermeier for working on the QuickSearch option during an internship.


    FOOTNOTES
 
Rolf Hühne is a Postdoc in the FLI Biocomputing Group.

Frank-Thomas Koch is a Postdoc in the FLI Biocomputing Group.

Jürgen Sühnel is Head of the FLI Biocomputing Group.


    REFERENCES
 TOP
 ABSTRACT
 INTRODUCTION
 COMPREHENSIVE 3D STRUCTURE...
 COMPARING THE RESOURCES
 CONCLUSIONS
 ACKNOWLEDGEMENTS
 REFERENCES
 

  1. Bernstein FC, Koetzle TF, Williams GJ, et al. The Protein Data Bank: a computer-based archival file for macromolecular structures. J Mol Biol (1977) 112:535–42.[Web of Science][Medline]
  2. Bourne PE, Westbrook J, Berman HM. The Protein Data Bank and lessons in data management. Brief Bioinform (2004) 5:23–30.[Abstract/Free Full Text]
  3. Editorial. Nat Struct Biol (1997) 4:329–30.[CrossRef][Web of Science][Medline]
  4. Berman H, Henrick K, Nakamura H, et al. The worldwide Protein Data Bank (wwPDB): ensuring a single, uniform archive of PDB data. Nucleic Acids Res (2007) 35:D301–3.[Abstract/Free Full Text]
  5. Doreleijers JF, Mading S, Maziuk D, et al. BioMagResBank database with sets of experimental NMR constraints corresponding to the structures of over 1400 biomolecules deposited in the Protein Data Bank. J Biomol NMR (2003) 26:139–46.[CrossRef][Web of Science][Medline]
  6. Peitsch MC, Wells TN, Stampf DR, et al. The Swiss-3DImage collection and PDB-Browser on the World-Wide Web. Trends Biochem Sci (1995) 20:82–4.[CrossRef][Web of Science][Medline]
  7. Suhnel J. Image library of biological macromolecules. Comput Appl Biosci (1996) 12:227–9.[Abstract/Free Full Text]
  8. Stampf DR, Felder CE, Sussman JL. PDBBrowse - a graphics interface to the Brookhaven protein data bank. Nature (1995) 374:572–4.[CrossRef][Medline]
  9. Sussman JL, Lin D, Jiang J, et al. Protein data bank (PDB): a database of 3D structural information of biological macromolecules. Acta Crystallogr D Biol Crystallogr (1998) 54:1078–84.[CrossRef][Medline]
  10. Wu CH, Apweiler R, Bairoch A, et al. The Universal Protein Resource (UniProt): an expanding universe of protein information. Nucleic Acids Res (2006) 34:D187–91.[Abstract/Free Full Text]
  11. Laskowski RA, Hutchinson EG, Michie AD, et al. PDBsum: a web-based database of summaries and analyses of all PDB structures. Trends Biochem Sci (1997) 22:488–90.[CrossRef][Web of Science][Medline]
  12. Nakamura H, Ito N, Kusunoki M. [Development of PDBj: Advanced database for protein structures]. Tanpakushitsu Kakusan Koso (2002) 47:1097–101.[Medline]
  13. Velankar S, McNeil P, Mittard-Runte V, et al. E-MSD: an integrated data resource for bioinformatics. Nucleic Acids Res (2005) 33:D262–5.[Abstract/Free Full Text]
  14. Berman HM, Westbrook J, Feng Z, et al. The Nucleic Acid Database. Acta Crystallogr D Biol Crystallogr (2002) 58:889–98.[CrossRef][Medline]
  15. Murthy VL, Rose GD. RNABase: an annotated database of RNA structures. Nucleic Acids Res (2003) 31:502–4.[Abstract/Free Full Text]
  16. Kleywegt GJ, Harris MR, Zou JY, et al. The Uppsala Electron-Density Server. Acta Crystallogr D Biol Crystallogr (2004) 60:2240–9.[CrossRef][Medline]
  17. Hooft RW, Vriend G, Sander C, Abola EE. Errors in protein structures. Nature (1996) 381:272.[Medline]
  18. Neshich G, Mazoni I, Oliveira SR, et al. The Star STING server: a multiplatform environment for protein structure analysis. Genet Mol Res (2006) 5:717–22.[Medline]
  19. Weissig H, Bourne PE. Protein structure resources. Acta Crystallogr D Biol Crystallogr (2002) 58:908–15.[CrossRef][Medline]
  20. Andreeva A, Howorth D, Brenner SE, et al. SCOP database in 2004: refinements integrate structure and sequence family data. Nucleic Acids Res (2004) 32:D226–9.[Abstract/Free Full Text]
  21. Greene LH, Lewis TE, Addou S, et al. The CATH domain structure database: new protocols and classification levels give a more comprehensive resource for exploring evolution. Nucleic Acids Res (2007) 35:D291–7.[Abstract/Free Full Text]
  22. Finn RD, Mistry J, Schuster-Bockler B, et al. Pfam: clans, web tools and services. Nucleic Acids Res (2006) 34:D247–51.[Abstract/Free Full Text]
  23. Walther D. WebMol—a Java-based PDB viewer. Trends Biochem Sci (1997) 22:274–5.[CrossRef][Web of Science][Medline]
  24. Sayle RA, Milner-White EJ. RASMOL: biomolecular graphics for all. Trends Biochem Sci (1995) 20:374.[CrossRef][Web of Science][Medline]
  25. Guex N, Peitsch MC. SWISS-MODEL and the Swiss-PdbViewer: an environment for comparative protein modeling. Electrophoresis (1997) 18:2714–23.[CrossRef][Web of Science][Medline]
  26. Gene Ontology Consortium. The Gene Ontology (GO) database and informatics resource. Nucleic Acids Res (2004) 32:D258–61.[Abstract/Free Full Text]
  27. Hulo N, Bairoch A, Bulliard V, et al. The PROSITE database. Nucleic Acids Res (2006) 34:D227–30.[Abstract/Free Full Text]
  28. Porter CT, Bartlett GJ, Thornton JM. The Catalytic Site Atlas: a resource of catalytic sites and residues identified in enzymes using structural data. Nucleic Acids Res (2004) 32:D129–33.[Abstract/Free Full Text]
  29. Bartlett GJ, Porter CT, Borkakoti N, et al. Analysis of catalytic residues in enzyme active sites. J Mol Biol (2002) 324:105–21.[CrossRef][Web of Science][Medline]
  30. Henrick K, Thornton JM. PQS: a protein quaternary structure file server. Trends Biochem Sci (1998) 23:358–61.[CrossRef][Web of Science][Medline]
  31. Golovin A, Dimitropoulos D, Oldfield T, et al. MSDsite: a database search and retrieval system for the analysis and viewing of bound ligands and active sites. Proteins (2005) 58:190–9.[CrossRef][Web of Science][Medline]
  32. Laskowski RA, Chistyakov VV, Thornton JM. PDBsum more: new summaries and analyses of the known 3D structures of proteins and nucleic acids. Nucleic Acids Res (2005) 33:D266–68.[Abstract/Free Full Text]
  33. Laskowski RA, MacArthur MW, Moss DS, et al. PROCHECK: a program to check the stereochemical quality of protein structures. J Appl Cryst (1993) 26:283–91.[CrossRef][Web of Science]
  34. Kraulis PJ. MOLSCRIPT: A program to produce both detailed and schematic plots of protein structures. J Appl Crystallogr (1991) 24:946–50.[CrossRef][Web of Science]
  35. Merritt EA, Bacon DJ. Raster3D: photorealistic molecular graphics. Methods Enzymol (1997) 277:505–24.[Web of Science][Medline]
  36. Luscombe NM, Laskowski RA, Thornton JM. NUCPLOT: a program to generate schematic diagrams of protein-nucleic acid interactions. Nucleic Acids Res (1997) 25:4940–5.[Abstract/Free Full Text]
  37. Laskowski RA. Enhancing the functional annotation of PDB structures in PDBsum using key figures extracted from the literature. Bioinformatics (2007) 23:1824–7.[Abstract/Free Full Text]
  38. Wang Y, Addess KJ, Chen J, et al. MMDB: annotating protein sequences with Entrez's 3D-structure database. Nucleic Acids Res (2007) 35:D298–300.[Abstract/Free Full Text]
  39. Gibrat JF, Madej T, Bryant SH. Surprising similarities in structure comparison. Curr Opin Struct Biol (1996) 6:377–85.[CrossRef][Web of Science][Medline]
  40. Reichert J, Suhnel J. The IMB Jena Image Library of Biological Macromolecules: 2002 update. Nucleic Acids Res (2002) 30:253–4.[Abstract/Free Full Text]
  41. de Magalhaes JP, Costa J, Toussaint O. HAGR: the human ageing genomic resources. Nucleic Acids Res (2005) 33:D537–43.[Abstract/Free Full Text]
  42. McKusick VA. Mendelian Inheritance in Man. A Catalog of Human Genes and Genetic Disorders. (1998) 12th. Baltimore: Johns Hopkins University Press.

Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
J Med MicrobiolHome page
P. C. F. Oyston, M. A. Fox, S. J. Richards, and G. C. Clark
Novel peptide therapeutics for treatment of infections
J. Med. Microbiol., August 1, 2009; 58(8): 977 - 987.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
R. A. Laskowski
PDBsum new things
Nucleic Acids Res., January 1, 2009; 37(suppl_1): D355 - D359.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow All Versions of this Article:
6/3/220    most recent
elm020v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Hühne, R.
Right arrow Articles by Sühnel, J.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Hühne, R.
Right arrow Articles by Sühnel, J.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?