Guide to Understanding PDB Data
Introduction
PDB Overview
Beginner’s Guide to PDBx/mmCIF
Dealing with Coordinates
Biological Assemblies
Missing Coordinates
Protein Primary Sequences
Protein Hierarchical Structure
Small Molecule Ligands
Exploring Carbohydrates
Methods for Determining Structure
Crystallographic Data
Computed Structure Models
Molecular Graphics Programs
Introduction to RCSB PDB APIs

Crystallographic Structure Factors and Electron Density, Resolution, and R-value

In a typical crystallographic experiment, a crystal is subjected to a narrow beam of intense X-rays, and the diffraction pattern is observed with a detector or a sheet of film. This pattern forms a characteristic array of spots, commonly referred to as reflections. Crystallographers measure the intensity of these reflections and use the information to determine the distribution of electrons in the crystal. The result is a map of the crystal that shows the distribution of electrons at each point, which may then be interpreted to find coordinates for each atom in the crystallized molecules.

Two pieces of information are needed to create an electron density map: the amplitude of X-rays in each reflection and the phase of X-rays in each reflection. Together, this information is used to define a complex number, termed the structure factor, which is used to calculate the electron density map. In a typical experiment, the amplitudes of the structure factors are obtained by measurement of the reflection intensities. The phases, however, are more tricky to measure, and crystallographers have developed several methods to estimate them.

The traditional method for estimating phases, termed isomorphous replacement, is to add a few electron-dense atoms, such as metal ions, to the crystal, and compare the diffraction pattern with similar crystals that do not include the heavy atoms. Looking at the differences, researchers can find the location of the heavy atoms, and then estimate phases based on their locations. Molecular replacement is also commonly used to estimate phases. In this case, the researcher uses a previously-solved structure of the molecule as a starting model, and calculates phases based on it. More recently, anomalous scattering of X-rays has become a common method for determining phases. In these cases, special atoms like selenium or bromine are added to the molecules, and the wavelength of the X-rays is carefully tuned to give anomalous scattering. By looking at small differences in symmetrical reflections in the diffraction pattern, the phases may be estimated directly.

For many of the structures in the PDB, the authors have deposited the primary crystallographic data along with the atomic model that was solved using the data. These data files may be download from Structure Summary pages. The files include a list of all of the reflections that were used in the structure determination. A typical file includes the h, k, and l indices for each reflection, a measure of the amplitude or intensity of the reflection, and often a measure of the standard uncertainty (sigma) of the reflection. The file often may include other pieces of information, such as a flag to identify reflections used for free R-value calculations or other details of the experiment.

Tip: You will find selenomethionine amino acids in many recent structures of proteins. This is a common way that researchers add selenium to proteins for use in determining phases by anomalous scattering. Since the selenium is chemically similar to sulfur, we expect that the protein structure will be similar to the form with the normal methionine amino acids.

structurefactors.jpg

The left image shows one plane through the three-dimensional diffraction pattern of a DNA crystal. Each spot has a characteristic intensity that is related to the distribution of electrons in the crystal. For instance, the row of dark spots 10 rows above and below the center are characteristic of the stacking of bases in DNA. The right image shows the electron density derived from the diffraction pattern of PDB entry 6bna, created using the Astex Viewer. The view shows one base pair with a guanine and a bromocytosine. The blue contours enclose most of the electrons, and show the overall shape of the bases, and the yellow contours enclose only regions with high electron density, such as the electron-dense bromine atom.

Resolution

Resolution is a measure of the quality of the data that has been collected on the crystal containing the protein or nucleic acid. If all of the proteins in the crystal are aligned in an identical way, forming a very perfect crystal, then all of the proteins will scatter X-rays the same way, and the diffraction pattern will show the fine details of crystal. On the other hand, if the proteins in the crystal are all slightly different, due to local flexibility or motion, the diffraction pattern will not contain as much fine information. So resolution is a measure of the level of detail present in the diffraction pattern and the level of detail that will be seen when the electron density map is calculated. High-resolution structures, with resolution values of 1 Å or so, are highly ordered and it is easy to see every atom in the electron density map. Lower resolution structures, with resolution of 3 Å or higher, show only the basic contours of the protein chain, and the atomic structure must be inferred. Most crystallographic-defined structures of proteins fall in between these two extremes. As a general rule of thumb, we have more confidence in the location of atoms in structures with resolution values that are small, called "high-resolution structures".

resolution-figure.jpg

Electron density maps for structures with a range of resolutions are shown. The first three show tyrosine 103 from myoglobin, from entries 1a6m (1.0 Å resolution), 106m (2.0 Å resolution), and 108m (2.7 Å resolution). The final example shows tyrosine 130 from hemoglobin (chain B), from entry 1s0h (3.0 Å resolution). In the pictures, the blue and yellow contours surround regions of high electron density, and the atomic model is shown with sticks. The electron density was imaged using the Astex viewer.

 

R-value and R-free

R-value is the measure of the quality of the atomic model obtained from the crystallographic data. When solving the structure of a protein, the researcher first builds an atomic model and then calculates a simulated diffraction pattern based on that model. The R-value measures how well the simulated diffraction pattern matches the experimentally-observed diffraction pattern. A totally random set of atoms will give an R-value of about 0.63, whereas a perfect fit would have a value of 0. Typical values are about 0.20.

A fit may not be perfect for many reasons. One major reason is that protein and nucleic acid crystals contain large channels of water. The water does not have a defined structure and is not included in the atomic model. Other reasons include disorder and vibration that is not accounted for in the model.

There is one potential problem with using R-values to assess the quality of a structure. The refinement process is often used to improve the atomic model of a given structure to make it fit better to the experimental data and improve the R-value. Unfortunately, this introduces bias into the process, since the atomic model is used along with the diffraction pattern to calculate the electron density. The use of the R-free value is a less biased way to look at this. Before refinement begins, about 10% of the experimental observations are removed from the data set. Then, refinement is performed using the remaining 90%. The R-free value is then calculated by seeing how well the model predicts the 10% that were not used in refinement. For an ideal model that is not over-interpreting the data, the R-free will be similar to the R-value. Typically, it is a little higher, with a value of about 0.26.

For more information on bias and R-values, see "Model Building and Refinement Practice" by G. J. Kleywegt and T. A. Jones, Methods in Enzymology 277, 208-230 (1997).