Guide to Understanding PDB Data
Biological Assemblies
Dealing with Coordinates
Methods for Determining Structure
Missing Coordinates and Biological Assemblies
Molecular Graphics Programs
R-value and R-free
Structure Factors and Electron Density
Primary Sequences and the PDB Format
Small Molecule Ligands

Methods for Determining Atomic Structures

Several methods are currently used to determine the structure of a protein, including X-ray crystallography, NMR spectroscopy, and electron microscopy. Each method has advantages and disadvantages. In each of these methods, the scientist uses many pieces of information to create the final atomic model. Primarily, the scientist has some kind of experimental data about the structure of the molecule. For X-ray crystallography, this is the X-ray diffraction pattern. For NMR spectroscopy, it is information on the local conformation and distance between atoms that are close to one another. In electron microscopy, it is an image of the overall shape of the molecule.

In most cases, this experimental information is not sufficient to build an atomic model from scratch. Additional knowledge about the molecular structure must be added. For instance, we often already know the sequence of amino acids in a protein, and we know the preferred geometry of atoms in a typical protein (for example, the bond lengths and bond angles). This information allows the scientist to build a model that is consistent with both the experimental data and the expected composition and geometry of the molecule.

When looking at PDB entries, it is always good to be a bit critical. Keep in mind that the structures in the PDB archive are determined using a balanced mixture of experimental observation and knowledge-based modeling. It often pays to take a little extra time to confirm for yourself that the experimental evidence for a particular structure supports the model as represented and the scientific conclusions based on the model.

X-ray Crystallography

Most of the structures included in the PDB archive were determined using X-ray crystallography. For this method, the protein is purified and crystallized, then subjected to an intense beam of X-rays. The proteins in the crystal diffract the X-ray beam into one or another characteristic pattern of spots, which are then analyzed (with some tricky methods to determine the phase of the X-ray wave in each spot) to determine the distribution of electrons in the protein. The resulting map of the electron density is then interpreted to determine the location of each atom. The PDB archive contains two types of data for crystal structures. The coordinate files include atomic positions for the final model of the structure, and the data files include the structure factors (the intensity and phase of the X-ray spots in the diffraction pattern) from the structure determination. You can create an image of the electron density map using tools like the Astex viewer, which is available through a link on the Structure Summary page.

X-ray crystallography can provide very detailed atomic information, showing every atom in a protein or nucleic acid along with atomic details of ligands, inhibitors, ions, and other molecules that are incorporated into the crystal. However, the process of crystallization is difficult and can impose limitations on the types of proteins that may be studied by this method. For example, X-ray crystallography is an excellent method for determining the structures of rigid proteins that form nice, ordered crystals. Flexible proteins, on the other hand, are far more difficult to study by this method because crystallography relies on having many, many molecules aligned in exactly the same orientation, like a repeated pattern in wallpaper. Flexible portions of protein will often be invisible in crystallographic electron density maps, since their electron density will be smeared over a large space. This is described in more detail on the page about missing coordinates.

Biological molecule crystals are finicky: some form perfect, well-ordered crystals and others form only poor crystals. The accuracy of the atomic structure that is determined depends on the quality of these crystals. In perfect crystals, we have far more confidence that the atomic structure correctly reflects the structure of the protein. Two important measures of the accuracy of a crystallographic structure are its resolution, which measures the amount of detail that may be seen in the experimental data, and the R-value, which measures how well the atomic model is supported by the experimental data found in the structure factor file.


The experimental electron density from a structure of DNA is shown here (PDB entry 196d), along with the atomic model that was generated based on the data. The contours surround regions with high densities of electrons, which correspond to the atoms in the molecule. This picture was created with the Astex viewer, which can be accessed by clicking the "EDS" link on the Structure Summary page for this entry.


As part of the biocuration process, the wwPDB generates Validation Reports that provide an assessment of structure quality using widely accepted standards and criteria. These Reports include an "executive” summary image of key quality indicators to help non-experts interpret these reports. For more information, visit

Exploring Biological Structure and Function using X-ray Free Electron Lasers (XFEL)

New technology, termed serial femtosecond crystallography, is revolutionizing the methods of X-ray crystallography. A free electron X-ray laser (XFEL) is used to create pulses of radiation that are extremely short (lasting only femtoseconds) and extremely bright. A stream of tiny crystals (nanometers to micrometers in size) is passed through the beam, and each X-ray pulse produces a diffraction pattern from a crystal, often burning it up in the process. A full data set is compiled from as many as tens of thousands of these individual diffraction patterns. The method is very powerful because it allows scientists to study molecular processes that occur over very short time scales, such as the absorption of light by biological chromophores.

Structures of photoactive yellow protein were determined by serial femtosecond crystallography after illumination, capturing the isomerization of the chromophore after it absorbs light. Structures included in this movie include: 5hd3 (ground state), 5hdc (100-400 femtoseconds after illumination), 5hdd (800-1200 femtoseconds), 5hds (3 picoseconds), 4b9o (100 picoseconds), 5hd5 (200 nanoseconds) and 1ts0 (1 millisecond). For more, see Molecule of the Month on Photoactive Yellow Protein.

NMR Spectroscopy

NMR spectroscopy may be used to determine the structure of proteins. The protein is purified, placed in a strong magnetic field, and then probed with radio waves. A distinctive set of observed resonances may be analyzed to give a list of atomic nuclei that are close to one another, and to characterize the local conformation of atoms that are bonded together. This list of restraints is then used to build a model of the protein that shows the location of each atom. The technique is currently limited to small or medium proteins, since large proteins present problems with overlapping peaks in the NMR spectra.

A major advantage of NMR spectroscopy is that it provides information on proteins in solution, as opposed to those locked in a crystal or bound to a microscope grid, and thus, NMR spectroscopy is the premier method for studying the atomic structures of flexible proteins. A typical NMR structure will include an ensemble of protein structures, all of which are consistent with the observed list of experimental restraints. The structures in this ensemble will be very similar to each other in regions with strong restraints, and very different in less constrained portions of the chain. Presumably, these areas with fewer restraints are the flexible parts of the molecule, and thus do not give a strong signal in the experiment.

In the PDB archive, you will typically find two types of coordinate entries for NMR structures. The first includes the full ensemble from the structural determination, with each structure designated as a separate model. The second type of entry is a minimized average structure. These files attempt to capture the average properties of the molecule based on the different observations in the ensemble. You can also find a list of restraints that were determined by the NMR experiment. These include things like hydrogen bonds and disulfide linkages, distances between hydrogen atoms that are close to one another, and restraints on the local conformation and stereochemistry of the chain.


Some of the restraints used to solve the structure of a small monomeric hemoglobin are shown here, using software from the BioMagResBank1. The protein (1vre and 1vrf) is shown in green, and restraints are shown in yellow.


Electron Microscopy

Electron microscopy is also used to determine structures of large macromolecular complexes. A beam of electrons is used to image the molecule directly. Several tricks are used to obtain 3D images. If the proteins can be coaxed into forming small crystals or if they pack symmetrically in a membrane, electron diffraction can be used to generate a 3D density map, using methods similar to X-ray diffraction. If the molecule is very symmetrical, such as in virus capsids, many separate images may be taken, providing a number of different views. These views are then aligned and averaged to extract 3D information. Electron tomography, on the other hand, obtains many views by rotating a single specimen and taking several electron micrographs. These views are then processed to give the 3D information.

For a few particularly well-behaved systems, electron diffraction produces atomic-level data, but typically, electron micrographic experiments do not allow the researcher to see each atom. Electron micrographic studies often combine information from X-ray crystallography or NMR spectroscopy to sort out the atomic details. Atomic structures are docked into the electron density map to yield a model of the complex. This has proven very useful for multimolecular structures such as complexes of ribosomes, tRNA and protein factors, and muscle actomyosin structures.


The tail of the T4 bacteriophage has been examined by combining electron microscopy and atomic structures. The image shows a surface rendering of the EM data (emd-1048) with atomic coordinates from PDB entries 1pdf, 1pdi, 1pdl, 1pdm, 1pdp, and 2fl8.