Introduction to PDB Data
The PDB archive is a repository of atomic coordinates and other information describing proteins and other important biological macromolecules. Structural biologists use methods such as X-ray crystallography, NMR spectroscopy, and cryo-electron microscopy to determine the location of each atom relative to each other in the molecule. They then deposit this information, which is then annotated and publicly released into the archive by the wwPDB.
The constantly-growing PDB is a reflection of the research that is happening in laboratories across the world. This can make it both exciting and challenging to use the database in research and education. Structures are available for many of the proteins and nucleic acids involved in the central processes of life, so you can go to the PDB archive to find structures for ribosomes, oncogenes, drug targets, and even whole viruses. However, it can be a challenge to find the information that you need, since the PDB archives so many different structures. You will often find multiple structures for a given molecule, or partial structures, or structures that have been modified or inactivated from their native form.
Guide to Understanding PDB Data is designed to help you get started with charting a path through this material, and help you avoid a few common pitfalls. These chapters are intertwined with one another. To begin, select a topic from the right menu, or select a topic from below:
- PDB Data
The primary information stored in the PDB archive consists of coordinate files for biological molecules. These files list the atoms in each protein, and their 3D location in space. These files are available in several formats (PDB, mmCIF, XML). A typical PDB formatted file includes a large "header" section of text that summarizes the protein, citation information, and the details of the structure solution, followed by the sequence and a long list of the atoms and their coordinates. The archive also contains the experimental observations that are used to determine these atomic coordinates.
- Visualizing Structures
While you can view PDB files directly using a text editor, it is often most useful to use a browsing or visualization program to look at them. Online tools, such as the ones on the RCSB PDB website, allow you to search and explore the information under the PDB header, including information on experimental methods and the chemistry and biology of the protein. Once you have found the PDB entries that you are interested in, you may use visualization programs to allow you to read in the PDB file, display the protein structure on your computer, and create custom pictures of it. These programs also often include analysis tools that allow you to measure distances and bond angles, and identify interesting structural features.
- Reading Coordinate Files
When you start exploring the structures in the PDB archive, you will need to know a few things about the coordinate files. In a typical entry, you will find a diverse mixture of biological molecules, small molecules, ions, and water. Often, you can use the names and chain IDs to help sort these out. In structures determined from crystallography, atoms are annotated with temperature factors that describe their vibration and occupancies that show if they are seen in several conformations. NMR structures often include several different models of the molecule.
- Potential Challenges
You may run into several challenges as you explore the PDB archive. For example, many structures, particular those determined by crystallography, only include information about part of the functional biological assembly. Fortunately the PDB can help with this. Also, many PDB entries are missing portions of the molecule that were not observed in the experiment. These include structures that include only alpha carbon positions, structures with missing loops, structures of individual domains, or subunits from a larger molecule. In addition, most of the crystallographic structure entries do not have information on hydrogen atoms.