Paper Models
Posters, Flyers & Calendars
Interactive Animations
Coloring Books
Crossword Puzzle
Education Corner
Guide to Understanding PDB Data

Gaël McGill, Ph.D. and Graham Johnson, Molecular Animators

Images of molecules are becoming more and more common in educational and entertainment media. These pictures are often created by computer graphics artists using state-of-the-art programs such as Maya and Cinema4D. However, the methods used to import PDB structures into these advanced programs can be challenging. David Goodsell recently spoke with two molecular graphics professionals to see what is available and what still needs to be done.

Q. First off, can you tell me a bit about yourselves and the work you are doing in molecular graphics?

A. GaëL McGill: Cell and molecular biology has been a passion from my middle school days. I came to the USA as an undergraduate specifically to study molecular biology and my dream was to be involved in research. I went on to do my Ph.D. at Harvard Medical School (mostly focusing on cancer signal transduction pathways and apoptosis using a varied mix of cell and molecular biology, biochemistry, animal model/genetics, and screening approaches). Having identified a need in the local academic, medical, biotech and pharmaceutical communities for "scientifically-informed" graphic design and web programming services, I started my company Digizyme (www.digizyme.com) in 1999 during my Ph.D. years. On a personal level, this really served as a creative outlet outside of the long hours at the bench. Digizyme has grown to offer more advanced services in recent years--including 3D animation services and even product design and visualization for the biomedical device industry. Over the past few years, I have "reintegrated" into academia in hopes of establishing a full-time team of scientist-animators at Harvard Medical School (where I currently teach Maya molecular visualization classes year-round). I enjoy the variety and challenge of client-driven projects as part of my work at Digizyme, and look forward to the freedom of pursuing larger-scale, longer-term collaborative projects relating to fundamental cell/molecular visualization challenges as an academic.

A. Graham Johnson: I graduated from the Department of Art as Applied to Medicine in the Johns Hopkins School of Medicine with a master's degree in Medical and Biological Illustration in 1997. At Hopkins, we studied anatomy and physiology with medical school students while simultaneously sketching and illustrating our dissections along with autopsies and surgeries observed at the hospital. After graduating, I focused on studying molecular and cell biology while illustrating the textbook Cell Biology with Tom Pollard and Bill Earnshaw. I began animating this content for other clients and realized that trained medical illustrators could contribute a great deal to this relatively unillustrated subject. It occurred to me that one could simulate most of my animations either by scripting the physics or by translating imported data to a format that my 3D software package could recognize. I began simulating a handful of my client animations, but quickly learned that the out-of-the-box software was not intended for this purpose and could only be used for inaccurate and simple molecular interactions. In 2005, I applied to the Ph.D. program at The Scripps Research Institute to work in the Molecular Graphics Laboratory under Arthur Olson to better understand the content and to communicate directly with a team of talented molecular graphics coders. As I head into my fourth year of the program, I'm finally producing some useable scripts that have made it easier to create illustrations and animations of molecular realms. I hope to distribute many of the tools very soon.

Q. How do you import PDB structures? Are these tools generally available?

A. Gaël McGill: For the most part, we use existing molecular graphics applications like Chimera and PyMOL to generate geometry files. These are typically exported as VRML (Virtual Reality Modeling Language) and then converted to OBJ format (a common data format for 3D data) before being imported into Maya. We also use MEL (Maya Embedded Language) scripts--either ones already available online (although currently there are not many related to PDB), or in-house/custom ones. Which method we use depends on what we will be doing with the geometry once inside Maya. A great option for bringing in large PDB datasets has been Chimera's "multiscale models" feature. Eventually it would be great to create a similar functionality for creating polygonal models within Maya itself in order to have more control over the output geometry. Still, this type of tool has been very useful in creating animations showcasing large complexes (like entire viruses).

A. Graham Johnson: I've written a COFFEE plug-in (Cinema4D's native scripting language) that imports a single PDB file or a list of PDB files directly into my viewer window as a set of points in space (used to generate smooth surface models such as metaballs), CPK spheres, or backbone spline. I'm building a primitive ribbon generator and hope to make the tools available for use within the next year. If I require a more sophisticated surface model, e.g., one colored by electrostatic potential, I'll export it from one of the popular molecular viewers as either a VRML or an OBJ file. Again, for static images, I'll often just export a screen grab from a molecular viewer that offers a style I'm after.

The Synapse Revealed created by Graham Johnson for the Howard Hughes Medical Institute Bulletin 2004

Q. What type of molecular imagery is most popular with your clients?

A. Graham Johnson: Because molecular graphics viewers are so user-friendly these days, clients rarely come to me to request an image or movie of a single molecule spinning on a monochromatic background. Most of my clients ask me to generate an editorial image, or to illustrate or animate a process or cell event involving multiple molecule types.

A. Gaël McGill: Although it depends on the project, I find that clients (especially biotech and pharma) want images of molecules "in context." In other words, scenery that captures a molecular process of interest but also places it within a cellular landscape. The challenge is to create a still image that captures or suggests a narrative or mechanism... essentially an "action shot" in which the visual context of the structure being depicted (and its binding partners) helps to communicate its function.

Q. I imagine that assembly of biologically relevant complexes (such as chromatin or a transcription complex) and modeling their dynamics poses difficult challenges--what types of tools do you use for this?

A. Gaël McGill: This is one of the toughest challenges at the moment (and it does not only apply just to complexes): how does one visually represent the dynamic aspects of proteins based on available (mostly static) data? The ability to create linear morphs between multiple conformational states of a protein using the adiabatic mapping technique (used by Mark Gerstein's method at www.molmovdb.org, for example) is very useful to visualize one possible trajectory, but it is only one possible trajectory and it also cannot tackle more complex morphs that involve partial refolding of protein domains. Drew Berry at The Walter and Eliza Hall Institute of Medical Research has pioneered a visual style that suggests the dynamics of proteins, but it would be nice to create animations that are based on actual data for these dynamics (i.e., as opposed to using noise/fractal motions throughout, having vibrations and degrees of flexibility that reflect the protein's actual range of 'thermodynamically-permissible' motion). In packages like Maya, we are currently limited to using pretty basic kinematic tools (i.e., building rigs driven by forward or inverse kinematics) that intrinsically have no knowledge of the molecular structure and its limitations or range of permissible torsion/bending. The software does not even register or warn against impending self-intersections--a problem that we are currently exploring in collaboration with topologists/software developers from the entertainment industry. At the moment (and depending on the target audience), we try to find as many sources of reference data as possible and use them as "inspiration" to create a dynamic representation of a protein or complex. The goal is to find more direct ways of integrating these data into the visualization (inasmuch as it helps communicate crucial parts of the story).

A. Graham Johnson: I've attempted to rig a handful of complex builders over the years with out-of-the-box toolsets. Such tools often do a great job of roughing a concept together, but fail when applied to large-scale systems or attempts to accurately simulate the rigor and detail often required for molecular imagery. Years ago, for example, I tried, to stitch together thousands of blocks with pairs of springs to represent the persistence length and flexible backbone of DNA in a plasmid. I animated a twist to see if it would supercoil, but the collision detector would always overload and the system would come to a screeching halt before the DNA could achieve a single twist. I've tried pouring virtual molecules into virtual organelles to fill them with random recipes of non-colliding molecules, but again, the technique has always proved to be slow, limited in volume, and relatively uncontrollable. Most particle generators I've toyed with have similar limitations to their physics simulation. To overcome these challenges, I've begun to construct scripts from scratch that attempt to combine the capabilities of simplified molecular dynamics with the visualization power of commercial 3D software.

Early Events in Reovirus Entry by Gaël McGill. The full movie can be viewed online at www.molecularmovies.com/movies/mcgilliwasa_reovirus.html

Q. Are there any resources that you would suggest to artists interested in incorporating PDB structures into their work?

A. Gaël McGill: Other than the fantastic PDB itself (not sure what we would do without it!), I recently launched a free resource for scientists interested in learning 3D software packages for cell and molecular visualization at www.molecularmovies.org. One section is a showcase/directory of some of the web's best cell and molecular movies (organized by scientific topic), and another is dedicated to tutorials and lectures. There are currently hundreds of pages of free tutorials that approach learning Maya in the context of biological visualization. More specifically, several of these tutorials focus on getting PDB data into 3D applications like Maya. Expansion of the site in the near future will also include a "Toolkit" section where animators can share scripts and plugins for PDB import (and other tasks related to molecular animation), and a new section that provides a more general directory of visual resources. The idea behind this last section is to find and organize non "narrative-driven" raw data visualizations (i.e. like time-lapse movies, MD simulations and other datasets) that animators can use as reference materials to create better visualizations.

A. Graham Johnson: The updated and integrated Electron Microscopy Data Bank (emdatabank.org) offers many low-resolution models of macromolecular structures and has a new online EM viewer. Many files in the PDB exist as low resolution structures with only alpha carbon coordinates published. If you need a rough approximation of the sidechains to generate a teaching model for such a molecule, you can generate or download a pre-generated version from MaxSprout (www.ebi.ac.uk/ Tools/maxsprout). Lastly, I find the TransMembrane PDB indispensable (pdbtm.enzim.hu).

Q. Have you had any projects that posed an insurmountable challenge?

A. Gaël McGill: The great thing about cell and molecular visualization is that there is an endless source of topics/mechanisms to visualize and each of these come with their own unique challenges. We may not always use the optimal solution or have the perfect tool available, but there is almost always a creative way to solve the visual representation challenges that emerge. It is one of the aspects of visualization with powerful packages like Maya that make this work so fresh and exciting!

A. Graham Johnson: Many projects have and I've often had to truncate my personal goals or compromise with the client to find some work-around because of strict deadlines. In years past, I sometimes had to resort to keyframe animation, hand drawn animation, simple 2D vector animation, and even static imagery to convey a message that could have been most clearly presented as a 3D animated sequence... I simply lacked the technology, skill, or time. Finding out, however, that molecular animation posed more challenges than my other medical illustration jobs directly inspired me to build tool sets to help meet such challenges.

Q. What new tools would you like to have?

A. Gaël McGill: As noted above, we are in the process of creating a suite of MEL scripts that can address some of the basic geometry-building tasks for getting PDB data into Maya (without having to resort to molecular graphics software-exported meshes). Once we have this first set of scripts (that just focus on efficient/clean geometry creation), the next step would be to explore the development of programmatically-driven rigging tools for defining the articulation of the models. In other words, to write scripts that not only create Maya-native geometry directly from the PDB but also automatically create a rig that has some inherent motion constraints applied. This is easier said than done and will of course depend on the type of molecular representation (ball & stick versus cartoon for example would have very different 'rules' applied to constrain motion). Having geometry that is more 'self-aware' (and that can at least avoid or warn about self-intersections) would be useful.

A. Graham Johnson: I agree that methods for exporting molecular models in styles that are animation-ready would be very helpful to everyone in the molecular illustration field. I would primarily like to see an extension of the PDB that offers biological unit matrices to help users generate pertinent symmetries. This works great in PDB files for viruses that have BIOMT lines in REMARK 350 to describe the transformation and orientation matrices needed to generate a complete virus1. More specifically, I'd love to see this for other common cell complexes. How can one generate an in vivo microtubule with 13 protofilaments and a proper seam from 1TUB for example? What rotation per y translation might a user need to enter to generate an actin filament from an actin monomer? A handful of filamentous files exist in the PDB, but animators can benefit from viewport and render time efficiencies afforded by modern software by cloning a single molecule rather than rendering coordinates from the thousands of copies of 1TUB needed to generate a lengthy microtubule.

A. Gaël McGill: Basic collision detection is also not easy to implement at the moment (whether between different parts of the same continuous mesh or between meshes). Some way of integrating electrostatic forces would also be amazing! Better simulation tools would also help us create molecular vistas with some semblance to what is happening in vivo. By simulation I don't mean at the same atom-by-atom level that molecular dynamics offers, but something that would drive the stochastic behavior of numerous molecules within a defined volume or environment, for example. Finally, an area that is ripe for exploration: we need to tap into the full promise of educational gaming and interactive environments by harnessing the power of modern gaming engines. In many cases, the digital assets (models, textures, rigs) used to develop high-end games are created in packages like Maya. So one could easily imagine a scenario where a lot of the work being done to create 'narrative-driven' molecular movies in Maya could be repurposed and adapted to generate interactive molecular environments for educational purposes.

Q. What packages do you typically use for your molecular animation projects?

Graham Johnson: In 1998, I generated most of my instructional static images directly from a package called Ribbons. It offered the most attractive defaults, endlessly adjustable styles, and one of the better-developed graphic user interfaces for its time. To this day, its outlining feature helps produce some of the most pedagogically useful rendering styles that can be reduced to nearly impossible sizes while allowing structures to remain legible. For glitzier editorial renderings, I used a computer graphicspipeline patched together by Dr. Witek Kwiatkowski (The Salk Institute) to meet my specific goals. It converts output from a variety of molecular viewers to a freeware renderer called PovRay. We can export surface models with electrostatic potentials from GRASP, for example, and fancy beaded ribbon models from MolScript. The modern molecular viewer Pymol now emulates this multi-hour process with the click of 2 or 3 buttons on any operating system.

For static pedagogic imagery, I still prefer to use renderings from molecular graphics viewers such as Ribbons, Pymol, or Chimera. I've also recently had my eyes opened to PMV (Python Molecular Viewer), which can generate a variety of outline styles in real-time that rival or beat the best of the commercial cartoon-style renderers for creating a pencil sketch styled contour for example. For a workflow pipeline, such as for creating a secondary signaling cascade involving 12 PDB files, I find it most efficient to first create a sketch (in a familiar media like pencil and paper if you want the final composition to look its best!). I render each molecule individually, combine snapshots of the images, and then fill in the background to couch the molecules in their proper context (e.g., draw organelle bilayers and matrices) with familiar tools as found in Photoshop or Illustrator. I use the model snapshots directly or just as skeletons for hand-rendered blobs to make each molecule less busy for more complicated scenes or to match an intended style so molecules blend into the background. To this day, I paint over most every 3D rendering I create to reduce that geometric, plastic, "computery" look, but as non-photorealistic algorithms and my lighting/texturing skills improve, I can spend significantly less time on this step each year.

Although PMV may change this soon, most molecular viewers do not offer a thorough (or even basic) set of animation tools that commercial 3D software users would recognize. For that reason, I have always turned to commercial packages. I started using Strata Studio Pro in 1996, but switched to the more stable Maxon's Cinema 4D (C4D) in 1999. I still use C4D to generate images with fancy textures, lights, or complicated scenes involving more than one molecule or editorial scenes involving molecules out of context (e.g., a nucleosome clamped under an old dissection magnifying glass on a desk). I also use C4D to generate all of my animations because it offers some easy to use tools for keyframing, character rigging, and physics/particle simulation.

Gaël McGill: I use Autodesk Maya Unlimited & Adobe After Effects usually in combination with UCSF's Chimera, Warren Delano's PyMOL (and sometimes Maxon's Cinema4D for metaballs) when dealing with PDB, EM, or microscopy datasets. Occasionally, I also use Pixologic ZBrush, and Luxology modo, and Adobe Flash, depending on the animation and its delivery format: self-running versus interactive and/or web-based movie.

Q. We often use many representations to visualize biological molecules: space filling, bonds, ribbons, etc. Do any of these cause particular problems?

Gaël McGill: When initially created within molecular graphics packages (like Chimera) and then exported to Maya, certain types of geometry can result in very heavy files (cartoon/ribbons and high-resolution surface meshes in particular). The other aspect is that the meshes tend to be messy and unpredictable in terms of the order of vertices in the polygonal model and other properties that one could use to programmatically rebuild the geometry within an application like Maya. The best solution moving forward will be to create a MEL scripts that start directly from the PDB coordinate file and generate identical looking ribbon or surface representations--but ones that are much lighter and cleaner because they have been built within Maya using more 'optimized' types of geometry, like NURBS (non-rational B-spline) for example.

Graham Johnson: I've encountered similar problems with each type of representation export. Some molecular viewers don't take advantage of the VRML file format and will export each sphere in a CPK model as a spherical mesh containing dozens or hundreds of points. Most packages, however, do just export a translation, radius and texture for each atom. Also, depending on the formatting used, a CPK model from one package might take 2 seconds to import, while the same model from a different molecular viewer export may take 3 minutes to import. Some will come in with a new texture map for each atom rather than references, so you'll end up with hundreds of oxygen reds for a typical protein. Each molecular viewer creates a slightly different looking ribbon style, which is nice to keep them recognizable, but difficult if you want to merge offerings from different packages. Most offer little adjustment, but PMV has a profile editor that will let you draw your own extrusion shape for helices, coils and beta sheets.

Surface models tend to be less problematic, but I find that most of them have redundant points describing each vertex, i.e., each polygon has its own corner point, even though the corner points meet. To fix this, I usually "optimize" any set of polygons I bring in to C4D which merges all of the redundant points cleans the look of the model mesh by allowing the shading model to function properly, and reduces the memory requirement size of the model often to ~30% of the original import. Ribbon models have become more predictable and easy to import in recent years, but be sure to set your Phong tags correctly so the nice 90 degree edges on your beta strands don't look rounded off. These models will also appear tessellated and require an "optimization" to repair.

Clients often want to animate a probable transition between two crystallized states of a particular molecule. All of the styles molecular viewers generate have inconsistent point numbers and point assignments. Morphing, for example, from a surface model of one protein conformation derived from a PDB file to a second conformation from a different PDB file inevitably fails. Even if the PDB files have the exact same sequences, the point order of their surface or ribbon meshes will differ with surface area and portions of the model will therefore turn inside out en route to their new position when interpolated. To work around this, I've had to use the PDB data more directly and generate a surface skin on the fly with metaballs while the raw data's atom skeleton moves "properly" below the surface (proper for animation rigging, not for detailed structural biology). A similar approach is required to transition smoothly from a helix to a random coil in a ribbon model as it morphs between states, but easy and automated methods don't yet exist and most techniques produce strobing intermediates that severely distract from understanding the backbone's motion.


Character rigging: Defining the form and articulation of a virtual model
Keyframe: a scene that defines the beginning or end of a smooth transition in animation
Pipeline: A series of (computational) methods used in succession ("computer graphics pipeline")
OBJ format: a file format for 3D geometry
Meshing: the process of defining an object as a collection of polygons
Point cloud: a collection of points used to define an object
Vertex order: the order of vertexes in a polygonal model
Phong averaging: a method for interpolating normal vectors defining polygons, which are then used in the Phong shading technique
adiabatic mapping: a method for generating atomic conformations intermediate between two given sets of coordinates
Metaballs: a method for generating 3D objects composed of a collection of "blobby" shapes
Rigging tools: computation methods for character rigging
MEL scripts: a file format for 3D geometry

About the Animators

Gaël McGill His website at www.molecularmovies.org hosts cell and molecular animations, and tutorials on how to import PDB data into programs such as Maya.

Graham Johnson publishes animations and more at FiVth Medical Media (www.fivth.com).

1Representation of viruses in the remediated PDB archive (2008) Acta Cryst. D64: 874-882 doi: 10.1107/S0907444908017393

This interview originally appeared in the Spring 2009 Newsletter