Guide to Understanding PDB Data
Introduction
Beginner’s Guide to PDB Structures and the PDBx/mmCIF Format
Dealing with Coordinates
Biological Assemblies
Missing Coordinates and Biological Assemblies
Primary Sequences and the PDB Format
Introduction to RCSB PDB APIs
Hierarchical Structure of Proteins
Exploring Carbohydrates in the PDB Archive
Small Molecule Ligands
Molecular Graphics Programs
Methods for Determining Structure
Computed Structure Models
Resolution
R-value and R-free
Structure Factors and Electron Density

R-value and R-free

R-value is the measure of the quality of the atomic model obtained from the crystallographic data. When solving the structure of a protein, the researcher first builds an atomic model and then calculates a simulated diffraction pattern based on that model. The R-value measures how well the simulated diffraction pattern matches the experimentally-observed diffraction pattern. A totally random set of atoms will give an R-value of about 0.63, whereas a perfect fit would have a value of 0. Typical values are about 0.20.

A fit may not be perfect for many reasons. One major reason is that protein and nucleic acid crystals contain large channels of water. The water does not have a defined structure and is not included in the atomic model. Other reasons include disorder and vibration that is not accounted for in the model.

There is one potential problem with using R-values to assess the quality of a structure. The refinement process is often used to improve the atomic model of a given structure to make it fit better to the experimental data and improve the R-value. Unfortunately, this introduces bias into the process, since the atomic model is used along with the diffraction pattern to calculate the electron density. The use of the R-free value is a less biased way to look at this. Before refinement begins, about 10% of the experimental observations are removed from the data set. Then, refinement is performed using the remaining 90%. The R-free value is then calculated by seeing how well the model predicts the 10% that were not used in refinement. For an ideal model that is not over-interpreting the data, the R-free will be similar to the R-value. Typically, it is a little higher, with a value of about 0.26.

For more information on bias and R-values, see "Model Building and Refinement Practice" by G. J. Kleywegt and T. A. Jones, Methods in Enzymology 277, 208-230 (1997).