Guide to Understanding PDB Data
Beginner’s Guide to PDB Structures and the PDBx/mmCIF Format
Dealing with Coordinates
Biological Assemblies
Missing Coordinates and Biological Assemblies
Primary Sequences and the PDB Format
Hierarchical Structure of Proteins
Exploring Carbohydrates in the PDB Archive
Small Molecule Ligands
Molecular Graphics Programs
Methods for Determining Structure
Computed Structure Models
R-value and R-free
Structure Factors and Electron Density
Introduction to RCSB PDB APIs

Introduction to PDB Data

The PDB archive is a repository of atomic coordinates and other information describing proteins and other important biological macromolecules. Structural biologists use methods such as X-ray crystallography, NMR spectroscopy, and cryo-electron microscopy to determine the location of each atom relative to each other in the molecule. They then deposit this information, which is then annotated and publicly released into the archive by the wwPDB.

The constantly-growing PDB is a reflection of the research that is happening in laboratories across the world. This can make it both exciting and challenging to use the database in research and education. Structures are available for many of the proteins and nucleic acids involved in the central processes of life, so you can go to the PDB archive to find structures for ribosomes, oncogenes, drug targets, and even whole viruses. However, it can be a challenge to find the information that you need, since the PDB archives so many different structures. You will often find multiple structures for a given molecule, or partial structures, or structures that have been modified or inactivated from their native form.

Guide to Understanding PDB Data is designed to help you get started with charting a path through this material, and help you avoid a few common pitfalls. These chapters are intertwined with one another. To begin, choose a topic from the menu on the left or menu, or select a topic linked in the paragraphs below:

  • PDB Data

    The primary information stored in the PDB archive consists of coordinate files for biological molecules. These files list the atoms in each protein, and their 3D location in space. These files are available in several formats (PDB, mmCIF, XML). A typical PDB formatted file includes a large "header" section of text that summarizes the protein, citation information, and the details of the structure solution, followed by the sequence and a long list of the atoms and their coordinates. The archive also contains the experimental observations that are used to determine these atomic coordinates.

  • Visualizing Structures

    While you can view PDB files directly using a text editor, it is often most useful to use a browsing or visualization program to look at them. Online tools, such as the ones on the RCSB PDB website, allow you to search and explore the information under the PDB header, including information on experimental methods and the chemistry and biology of the protein. Once you have found the PDB entries that you are interested in, you may use visualization programs to allow you to read in the PDB file, display the protein structure on your computer, and create custom pictures of it. These programs also often include analysis tools that allow you to measure distances and bond angles, and identify interesting structural features.

  • Reading Coordinate Files

    When you start exploring the structures in the PDB archive, you will need to know a few things about the coordinate files. In a typical entry, you will find a diverse mixture of biological molecules, small molecules, ions, and water. Often, you can use the names and chain IDs to help sort these out. In structures determined from crystallography, atoms are annotated with temperature factors that describe their vibration and occupancies that show if they are seen in several conformations. NMR structures often include several different models of the molecule.

  • Potential Challenges

    You may run into several challenges as you explore the PDB archive. For example, many structures, particular those determined by crystallography, only include information about part of the functional biological assembly. Fortunately the PDB can help with this. Also, many PDB entries are missing portions of the molecule that were not observed in the experiment. These include structures that include only alpha carbon positions, structures with missing loops, structures of individual domains, or subunits from a larger molecule. In addition, most of the crystallographic structure entries do not have information on hydrogen atoms.

Except where noted, this feature is written and illustrated by David S. Goodsell.