Learn
Paper Models
Flyers, Posters, & Calendars
Videos
Interactive Animations
Coloring Books
Structural Biology Highlights
3D Printing
Exploring the Structural Biology of Cancer
Exploring the Structural Biology of Bioenergy
Exploring the Structural Biology of Viruses
Exploring the Structural Biology of Health and Nutrition
Exploring the Structural Biology of Evolution
Exploring Structural Biology with Computed Structure Models (CSMs)
COVID-19 Pandemic Resources
Other Resources

PDB Data, RCSB PDB, and KBase

plant with related molecules

RCSB PDB users interested in exploring the functions of genes in plant and microbial genomes may be interested in the DOE Systems Biology Knowledgebase (KBase). KBase is a platform for systems biology research that offers protein structure-related tools, visualizations, and workflows that enhance connections to PDB data.

KBase is a cloud-based, open-source software system that allows researchers to integrate, analyze, and visualize large-scale biological data from various sources, such as genomics, proteomics, metabolomics, as well as macromolecular structure PDB data from. KBase also provides a suite of tools for modeling and simulating biological systems, and allows for collaboration and sharing of data and results among researchers. The goal of KBase is to advance systems biology research and accelerate the discovery of new biological insights and technologies.

Using KBase in Research

After creating an account, users can take advantage of the utilities offered by KBase through the design of project Narratives, which are notebooks that contain a sequence of steps in a data processing pipeline (and associated documentation) for transforming and analyzing a given set of input data (Figure 1). For example, you could create a Narrative to gather homologs for a given protein sequence, generate a multiple sequence alignment of the homologs, and then produce a phylogenetic tree, as done in this use case (must have KBase account to view this narrative).

Narratives are constructed by chaining together a series of pre-built modules called “apps”, each of which performs some specific step in the pipeline (e.g., import a collection of FASTA files, perform a BLAST search on an input sequence, etc.). (For those familiar with Python Jupyter Notebooks, KBase Narratives work in a similar way.) Users can even create their own apps to perform tasks that are custom to their specific research objectives. The final assembled narrative can then be made public so it may be shared with other researchers and/or incorporated into a publication.

Narratives can be developed for a wide range of research applications, such as:

  • Assembly and annotation
  • Sequence analysis
  • Metabolic modeling
  • RNA-seq and expression analysis
  • Comparative genomics

Example of a KBase Narrative
Figure 1: Example of a KBase Narrative. (Figure from Arkin, A. et al., 2018.)

Leveraging KBase with RCSB.org services

There is significant potential in the ways that PDB data can be integrated with KBase. For example, you can use the cheminformatics analysis and metabolic modeling tools in KBase to search for a gene candidate of interest, and then seamlessly query the RCSB PDB for experimentally resolved structures corresponding to those gene candidates, as well as the closest structural homologs if there are no direct matches available. KBase also allows for the import and visualization of your own computed structure models (CSMs) (e.g., AlphaFold2 structure predictions) of the gene products, which may further be used at RCSB.org for performing structure similarity searches as well as visualizing overlays with related structures for analyzing important features such as binding domains and ligand interactions. Additionally, you can perform many types of queries to RCSB.org directly from within KBase to retrieve a list of matching experimental PDB structures, such as sequence similarity searches for given input protein sequence or searches for a particular enzyme commission (EC) number of interest.

The following case studies (made accessible as KBase Narratives) demonstrate some of the powerful ways in which KBase can leverage PDB data through the use of RCSB PDB APIs.

KBase-RCSB PDB apps:

The construction of workflows such as those in the case studies above is made possible through the availability and use of several pre-built apps (as listed below), which allow you to import and query PDB data directly from your KBase Narratives. These apps function by making use of the powerful Search and Data APIs available at RCSB PDB, which allow users to perform specific queries for PDB structures that match a given set of parameters.

The set of KBase-RCSB PDB apps currently available for use in Narratives are listed below. (Note that you must be registered with and signed-into a KBase account to access these links.)

Tutorials and Resources

References

Arkin, A., Cottingham, R., Henry, C. et al. KBase: The United States Department of Energy Systems Biology Knowledgebase (2018) Nat Biotechnol 36: 566–569 https://doi.org/10.1038/nbt.4163