publications | E.Z. Lab

2025

Multiscale guidance of AlphaFold3 with heterogeneous cryo-EM data

Rishwanth Raghu, Axel Levy, Gordon Wetzstein, and Ellen D. Zhong

In Neural Information Processing Systems (NeurIPS), 2025

Abs arXiv Bib

Protein structure prediction models are now capable of generating accurate 3D structural hypotheses from sequence alone. However, they routinely fail to capture the conformational diversity of dynamic biomolecular complexes, often requiring heuristic MSA subsampling approaches for generating alternative states. In parallel, cryo-electron microscopy (cryo-EM) has emerged as a powerful tool for imaging near-native structural heterogeneity, but is challenged by arduous pipelines to go from raw experimental data to atomic models. Here, we bridge the gap between these modalities, combining cryo-EM density maps with the rich sequence and biophysical priors learned by protein structure prediction models. Our method, CryoBoltz, guides the sampling trajectory of a pretrained protein structure prediction model using both global and local structural constraints derived from density maps, driving predictions towards conformational states consistent with the experimental data. We demonstrate that this flexible yet powerful inference-time approach allows us to build atomic models into heterogeneous cryo-EM maps across a variety of dynamic biomolecular systems including transporters and antibodies.
@inproceedings{Raghu2025CryoBoltz, title = {Multiscale guidance of AlphaFold3 with heterogeneous cryo-EM data}, author = {Raghu, Rishwanth and Levy, Axel and Wetzstein, Gordon and Zhong, Ellen D.}, booktitle = {Neural Information Processing Systems (NeurIPS)}, year = {2025}, month = dec, }
Atomic Diffusion Models for Small Molecule Structure Elucidation from NMR Spectra

Ziyu Xiong, Yichi Zhang, Foyez Alauddin, Chu Xin Cheng, Joon Soo An, Mohammad R. Seyedsayamdost, and Ellen D. Zhong

In Neural Information Processing Systems (NeurIPS), 2025

Abs arXiv Bib Code

Nuclear Magnetic Resonance (NMR) spectroscopy is a cornerstone technique for determining the structures of small molecules and is especially critical in the dis- covery of novel natural products and clinical therapeutics. Yet, interpreting NMR spectra remains a time-consuming, manual process requiring extensive domain expertise. We introduce CHEFNMR (CHemical Elucidation From NMR), an end- to-end framework that directly predicts an unknown molecule’s structure solely from its 1D NMR spectra and chemical formula. We frame structure elucidation as conditional generation from an atomic diffusion model built on a non-equivariant transformer architecture. To model the complex chemical groups found in natural products, we generated a dataset of simulated 1D NMR spectra for over 111,000 natural products. CHEFNMR predicts the structures of challenging natural prod- uct compounds with an unsurpassed accuracy of over 65%. This work takes a significant step toward solving the grand challenge of automating small-molecule structure elucidation and highlights the potential of deep learning in accelerating molecular discovery.
@inproceedings{Xiong2025ChefNMR, title = {Atomic Diffusion Models for Small Molecule Structure Elucidation from NMR Spectra}, author = {Xiong, Ziyu and Zhang, Yichi and Alauddin, Foyez and Cheng, Chu Xin and An, Joon Soo and Seyedsayamdost, Mohammad R. and Zhong, Ellen D.}, booktitle = {Neural Information Processing Systems (NeurIPS)}, year = {2025}, month = dec, }
CryoDRGN-AI: neural ab initio reconstruction of challenging cryo-EM and cryo-ET datasets

Axel Levy, Rishwanth Raghu, Ryan Feathers, Michal Grzadkowski, Frederic Poitevin, Jake D. Johnston, Francesca Vallese, Oliver B. Clarke, Gordon Wetzstein, and Ellen D. Zhong

Nature Methods, 2025

Abs Bib HTML Code Website

Proteins and other biomolecules form dynamic macromolecular machines that are tightly orchestrated to move, bind and perform chemistry. Cryo-electron microscopy and cryo-electron tomography can access the intrinsic heterogeneity of these complexes and are therefore key tools for understanding their function. However, three-dimensional reconstruction of the collected imaging data presents a challenging computational problem, especially without any starting information, a setting termed ab initio reconstruction. Here we introduce cryoDRGN-AI, a method leveraging an expressive neural representation and combining an exhaustive search strategy with gradient-based optimization to process challenging heterogeneous datasets. Using cryoDRGN-AI, we reveal new conformational states in large datasets, reconstruct previously unresolved motions from unfiltered datasets and demonstrate ab initio reconstruction of biomolecular complexes from in situ data. With this expressive and scalable model for structure determination, we hope to unlock the full potential of cryo-electron microscopy and cryo-electron tomography as a high-throughput tool for structural biology and discovery.
@article{Levy2025CryoDRGNAI, title = {CryoDRGN-AI: neural ab initio reconstruction of challenging cryo-EM and cryo-ET datasets}, author = {Levy, Axel and Raghu, Rishwanth and Feathers, Ryan and Grzadkowski, Michal and Poitevin, Frederic and Johnston, Jake D. and Vallese, Francesca and Clarke, Oliver B. and Wetzstein, Gordon and Zhong, Ellen D.}, journal = {Nature Methods}, volume = {22}, number = {7}, pages = {1486--1494}, day = {26}, month = jun, year = {2025}, publisher = {Nature Publishing Group}, doi = {10.1038/s41592-025-02720-4}, url = {https://www.nature.com/articles/s41592-025-02720-4}, }
Cryo-ET reveals the in situ architecture of the polar tube invasion apparatus from microsporidian parasites

Mahrukh Usmani, Nicolas Coudray, Margot Riggi, Rishwanth Raghu, Harshita Ramchandani, Daija Bobe, Mykhailo Kopylov, Ellen D. Zhong, Janet H. Iwasa, Damian C. Ekiert, and Gira Bhabha

Proceedings of the National Academy of Sciences, 2025

Abs Bib HTML

Microsporidia are an early-branching group of fungi. More than 1,500 species of microsporidia have been reported, which can infect a wide range of animal hosts. These include humans, as well as ecologically and economically important animals such as honey bees and aquatic animals. Microsporidia have evolved unique organelles, such as an invasion apparatus, called the polar tube, and a closely associated membranous organelle, called the polaroplast. Studying microsporidian cell biology has been hampered by a lack of genetic tools. Imaging approaches have thus been invaluable in studying their biology. Here, we use an integrative approach, combining cryoelectron tomography with cellular modeling to provide insights into the ultrastructure of the microsporidian invasion organelle, and associated organelles in situ. Microsporidia are divergent fungal pathogens that employ a unique harpoon-like apparatus called the polar tube (PT) to invade host cells. The long PT is fired out of the microsporidian spore over the course of just a few hundred milliseconds. Once fired, the PT is thought to pierce the plasma membrane of a target cell and act as a conduit for the transfer of the parasite into the host cell, which initiates infection. The PT architecture and its association with neighboring organelles within the parasite cell remain poorly understood. Here, we use cryoelectron tomography to investigate the structural cell biology of the PT in dormant spores from the human-infecting microsporidian species, Encephalitozoon intestinalis. Segmentation and subtomogram averaging of the PT reveal at least four layers: two protein-based layers surrounded by a membrane layer and filled with a dense core. Regularly spaced protein filaments form the structural skeleton of the PT. Combining cryoelectron tomography with cellular modeling, we propose a model for the three-dimensional organization of the polaroplast, an organelle that surrounds the PT and is continuous with the outermost, membranous layer of the PT. Our results reveal the ultrastructure of the microsporidian invasion apparatus in situ, laying the foundation for understanding infection mechanisms.
@article{Usmani2025PolarTube, title = {Cryo-ET reveals the in situ architecture of the polar tube invasion apparatus from microsporidian parasites}, author = {Usmani, Mahrukh and Coudray, Nicolas and Riggi, Margot and Raghu, Rishwanth and Ramchandani, Harshita and Bobe, Daija and Kopylov, Mykhailo and Zhong, Ellen D. and Iwasa, Janet H. and Ekiert, Damian C. and Bhabha, Gira}, journal = {Proceedings of the National Academy of Sciences}, volume = {122}, number = {11}, pages = {e2415233122}, month = mar, year = {2025}, doi = {10.1073/pnas.2415233122}, url = {https://www.pnas.org/doi/abs/10.1073/pnas.2415233122}, }
The Inaugural Flatiron Institute Cryo-EM Conformational Heterogeneity Challenge

Miro A. Astore, Geoffrey Woollard, David Silva-Sanchez, Wenda Zhou, Mykhailo Kopylov, Khanh Dao Duc, Roy R. Lederman, Yilai Li, Yi Zhou, Jing Yuan, Fei Ye, Quanquan Gu, Remi Vuillemot, Slavica Jonic, Lan Dang, Steven J. Ludtke, Hannah Bridges, Serena Liu, Michael McLean, Valentin Peretroukhin, Johannes Schwab, Eduardo R. Cruz-Chu, Peter Schwander, Marc A. Gilles, Amit Singer, David Herreros, Jose Maria Carazo, Carlos Oscar S. Sorzano, J. Ryan Feathers, Ellen D. Zhong, Nikolaus Grigorieff, Pilar Cossio, and Sonya M. Hanson

bioRxiv, 2025

Abs Bib HTML

Despite the rise of single particle cryo-electron microscopy (cryo-EM) as a premier method for resolving macromolecular structures at atomic resolution, methods to address molecular heterogeneity in vitrified samples have yet to reach maturity. With an increasing number of new methods to analyze the multitude of heterogeneous states captured in single particle images, a systematic approach to validation in this field is needed. With this motivation, we issued a challenge to the community to analyze two cryo-EM image particle sets of the thyroglobulin molecule with continuous conformational heterogeneity. The first dataset was a experimental, and the second was generated with a simulator, allowing control over the distribution of molecular structures in the particle images. This simulated dataset also enabled direct comparison between participants’ submissions and the ground truth molecular structures and distributions. Participants were asked to submit 80 volumes representing the heterogeneous ensemble and estimate their respective populations in the image sets provided. Participation of the research community in the challenge was strong, with submissions from nearly all developers of heterogeneity methods, resulting in 41 submissions across both datasets. Submissions qualitatively exceeded expectations, with the molecular motions identified by methods resembling both each other and the ground truth motion. However, quantitatively assessing these similarities was a challenge in and of itself. In the process of assessing the submissions to this challenge, we developed several validation metrics, most of which require reference to the underlying ground truth volumes. However, we have also explored the use of metrics which do not necessarily reference ground truth. This is particularly apt for experimental datasets where ground truth is inaccessible. These approaches allowed us to assess the similarity and accuracy in volume quality, molecular motions, and conformational distribution of different submissions. These metrics and the efforts of all participants will help chart a path forward for the improvements of heterogeneity methods for cryo-EM and future challenges to validate these new methods as they continue to be developed by the community.
@article{astore2025inaugural, title = {The Inaugural Flatiron Institute Cryo-EM Conformational Heterogeneity Challenge}, author = {Astore, Miro A. and Woollard, Geoffrey and Silva-Sanchez, David and Zhou, Wenda and Kopylov, Mykhailo and Dao Duc, Khanh and Lederman, Roy R. and Li, Yilai and Zhou, Yi and Yuan, Jing and Ye, Fei and Gu, Quanquan and Vuillemot, Remi and Jonic, Slavica and Dang, Lan and Ludtke, Steven J. and Bridges, Hannah and Liu, Serena and McLean, Michael and Peretroukhin, Valentin and Schwab, Johannes and Cruz-Chu, Eduardo R. and Schwander, Peter and Gilles, Marc A. and Singer, Amit and Herreros, David and Carazo, Jose Maria and Sorzano, Carlos Oscar S. and Feathers, J. Ryan and Zhong, Ellen D. and Grigorieff, Nikolaus and Cossio, Pilar and Hanson, Sonya M.}, journal = {bioRxiv}, pages = {2025--07}, year = {2025}, publisher = {Cold Spring Harbor Laboratory}, doi = {10.1101/2025.07.18.665582}, }

2024

CryoBench: Diverse and challenging datasets for the heterogeneity problem in cryo-EM

Minkyu Jeon, Rishwanth Raghu, Miro Astore, Geoffrey Woollard, Ryan Feathers, Alkin Kaz, Sonya M Hanson, Pilar Cossio, and Ellen D. Zhong

In Neural Information Processing Systems (NeurIPS), 2024

Spotlight

Abs arXiv Bib Website

Cryo-electron microscopy (cryo-EM) is a powerful technique for determining high-resolution 3D biomolecular structures from imaging data. Its unique ability to capture structural variability has spurred the development of heterogeneous reconstruction algorithms that can infer distributions of 3D structures from noisy, unlabeled imaging data. Despite the growing number of advanced methods, progress in the field is hindered by the lack of standardized benchmarks with ground truth information and reliable validation metrics. Here, we introduce CryoBench, a suite of datasets, metrics, and benchmarks for heterogeneous reconstruction in cryo-EM. CryoBench includes five datasets representing different sources of heterogeneity and degrees of difficulty. These include conformational heterogeneity generated from designed motions of antibody complexes or sampled from a molecular dynamics simulation, as well as compositional heterogeneity from mixtures of ribosome assembly states or 100 common complexes present in cells. We then analyze state-of-the-art heterogeneous reconstruction tools, including neural and non-neural methods, assess their sensitivity to noise, and propose new metrics for quantitative evaluation. We hope that CryoBench will be a foundational resource for accelerating algorithmic development and evaluation in the cryo-EM and machine learning communities. Project page: https://cryobench.cs.princeton.edu.
@inproceedings{Jeon2024CryoBench, title = {CryoBench: Diverse and challenging datasets for the heterogeneity problem in cryo-EM}, author = {Jeon, Minkyu and Raghu, Rishwanth and Astore, Miro and Woollard, Geoffrey and Feathers, Ryan and Kaz, Alkin and Hanson, Sonya M and Cossio, Pilar and Zhong, Ellen D.}, booktitle = {Neural Information Processing Systems (NeurIPS)}, note = {Spotlight}, month = dec, year = {2024}, }
Mixture of neural fields for heterogeneous reconstruction in cryo-EM

Axel Levy^*, Rishwanth Raghu^*, David Shustin^*, Adele Rui-Yang Peng, Huan Li, Oliver Biggs Clarke, Gordon Wetzstein, and Ellen D. Zhong

In Neural Information Processing Systems (NeurIPS), 2024

Abs arXiv Bib Website

Cryo-electron microscopy (cryo-EM) is an experimental technique for protein structure determination that images an ensemble of macromolecules in near-physiological contexts. While recent advances enable the reconstruction of dynamic conformations of a single biomolecular complex, current methods do not adequately model samples with mixed conformational and compositional heterogeneity. In particular, datasets containing mixtures of multiple proteins require the joint inference of structure, pose, compositional class, and conformational states for 3D reconstruction. Here, we present Hydra, an approach that models both conformational and compositional heterogeneity fully ab initio by parameterizing structures as arising from one of K neural fields. We employ a new likelihood-based loss function and demonstrate the effectiveness of our approach on synthetic datasets composed of mixtures of proteins with large degrees of conformational variability. We additionally demonstrate Hydra on an experimental dataset of a cellular lysate containing a mixture of different protein complexes. Hydra expands the expressivity of heterogeneous reconstruction methods and thus broadens the scope of cryo-EM to increasingly complex samples.
@inproceedings{Levy2024Hydra, title = {Mixture of neural fields for heterogeneous reconstruction in cryo-EM}, author = {Levy*, Axel and Raghu*, Rishwanth and Shustin*, David and Peng, Adele Rui-Yang and Li, Huan and Clarke, Oliver Biggs and Wetzstein, Gordon and Zhong, Ellen D.}, booktitle = {Neural Information Processing Systems (NeurIPS)}, month = dec, year = {2024}, }
CryoDRGN-ET: deep reconstructing generative networks for visualizing dynamic biomolecules inside cells

Ramya Rangan^*, Ryan Feathers^*, Sagar Khavnekar, Adam Lerer, Jake Johnston, Ron Kelley, Martin Obr, Abhay Kotecha, and Ellen D. Zhong

Nature Methods, 2024

Abs Bib HTML PDF Code

Advances in cryo-electron tomography (cryo-ET) have produced new opportunities to visualize the structures of dynamic macromolecules in native cellular environments. While cryo-ET can reveal structures at molecular resolution, image processing algorithms remain a bottleneck in resolving the heterogeneity of biomolecular structures in situ. Here, we introduce cryoDRGN-ET for heterogeneous reconstruction of cryo-ET subtomograms. CryoDRGN-ET learns a deep generative model of three-dimensional density maps directly from subtomogram tilt-series images and can capture states diverse in both composition and conformation. We validate this approach by recovering the known translational states in Mycoplasma pneumoniae ribosomes in situ. We then perform cryo-ET on cryogenic focused ion beam–milled Saccharomyces cerevisiae cells. CryoDRGN-ET reveals the structural landscape of S. cerevisiae ribosomes during translation and captures continuous motions of fatty acid synthase complexes inside cells. This method is openly available in the cryoDRGN software.
@article{Rangan2024, author = {Rangan*, Ramya and Feathers*, Ryan and Khavnekar, Sagar and Lerer, Adam and Johnston, Jake and Kelley, Ron and Obr, Martin and Kotecha, Abhay and Zhong, Ellen D.}, title = {CryoDRGN-ET: deep reconstructing generative networks for visualizing dynamic biomolecules inside cells}, month = jun, year = {2024}, doi = {10.1038/s41592-024-02340-4}, publisher = {Nature Publishing Group US New York}, url = {https://www.nature.com/articles/s41592-024-02340-4}, journal = {Nature Methods} }
Solving Inverse Problems in Protein Space Using Diffusion-Based Priors

Axel Levy, Eric R. Chan, Sara Fridovich-Keil, Frédéric Poitevin, Ellen D. Zhong, and Gordon Wetzstein

arXiv, 2024

Abs arXiv Bib Website

The interaction of a protein with its environment can be understood and controlled via its 3D structure. Experimental methods for protein structure determination, such as X-ray crystallography or cryogenic electron microscopy, shed light on biological processes but introduce challenging inverse problems. Learning-based approaches have emerged as accurate and efficient methods to solve these inverse problems for 3D structure determination, but are specialized for a predefined type of measurement. Here, we introduce a versatile framework to turn raw biophysical measurements of varying types into 3D atomic models. Our method combines a physics-based forward model of the measurement process with a pretrained generative model providing a task-agnostic, data-driven prior. Our method outperforms posterior sampling baselines on both linear and non-linear inverse problems. In particular, it is the first diffusion-based method for refining atomic models from cryo-EM density maps.
@article{Levy2024ADP3D, title = {Solving Inverse Problems in Protein Space Using Diffusion-Based Priors}, author = {Levy, Axel and Chan, Eric R. and Fridovich-Keil, Sara and Poitevin, Frédéric and Zhong, Ellen D. and Wetzstein, Gordon}, year = {2024}, month = jun, day = {6}, journal = {arXiv}, eprint = {2406.04239}, archiveprefix = {arXiv}, primaryclass = {cs.LG} }
Accurate structure prediction of biomolecular interactions with AlphaFold 3

Josh Abramson, Jonas Adler, Jack Dunger, Richard Evans, Tim Green, Alexander Pritzel, Olaf Ronneberger, Lindsay Willmore, Andrew J. Ballard, Joshua Bambrick, Sebastian W. Bodenstein, David A. Evans, Chia-Chun Hung, Michael O’Neill, David Reiman, Kathryn Tunyasuvunakool, Zachary Wu, Akvilė Žemgulytė, Eirini Arvaniti, Charles Beattie, Ottavia Bertolli, Alex Bridgland, Alexey Cherepanov, Miles Congreve, Alexander I. Cowen-Rivers, Andrew Cowie, Michael Figurnov, Fabian B. Fuchs, Hannah Gladman, Rishub Jain, Yousuf A. Khan, Caroline M. R. Low, Kuba Perlin, Anna Potapenko, Pascal Savy, Sukhdeep Singh, Adrian Stecula, Ashok Thillaisundaram, Catherine Tong, Sergei Yakneen, Ellen D. Zhong, Michal Zielinski, Augustin Žídek, Victor Bapst, Pushmeet Kohli, Max Jaderberg, Demis Hassabis, and John M. Jumper

Nature, 2024

Abs Bib HTML

The introduction of AlphaFold 2 has spurred a revolution in modelling the structure of proteins and their interactions, enabling a huge range of applications in protein modelling and design. In this paper, we describe our AlphaFold 3 model with a substantially updated diffusion-based architecture, which is capable of joint structure prediction of complexes including proteins, nucleic acids, small molecules, ions, and modified residues. The new AlphaFold model demonstrates significantly improved accuracy over many previous specialised tools: far greater accuracy on protein-ligand interactions than state of the art docking tools, much higher accuracy on protein-nucleic acid interactions than nucleic-acid-specific predictors, and significantly higher antibody-antigen prediction accuracy than AlphaFold-Multimer v2.3. Together these results show that high accuracy modelling across biomolecular space is possible within a single unified deep learning framework.
@article{Abramson2024, author = {Abramson, Josh and Adler, Jonas and Dunger, Jack and Evans, Richard and Green, Tim and Pritzel, Alexander and Ronneberger, Olaf and Willmore, Lindsay and Ballard, Andrew J. and Bambrick, Joshua and Bodenstein, Sebastian W. and Evans, David A. and Hung, Chia-Chun and O'Neill, Michael and Reiman, David and Tunyasuvunakool, Kathryn and Wu, Zachary and {\v{Z}}emgulyt{\.{e}}, Akvil{\.{e}} and Arvaniti, Eirini and Beattie, Charles and Bertolli, Ottavia and Bridgland, Alex and Cherepanov, Alexey and Congreve, Miles and Cowen-Rivers, Alexander I. and Cowie, Andrew and Figurnov, Michael and Fuchs, Fabian B. and Gladman, Hannah and Jain, Rishub and Khan, Yousuf A. and Low, Caroline M. R. and Perlin, Kuba and Potapenko, Anna and Savy, Pascal and Singh, Sukhdeep and Stecula, Adrian and Thillaisundaram, Ashok and Tong, Catherine and Yakneen, Sergei and Zhong, Ellen D. and Zielinski, Michal and {\v{Z}}{\'i}dek, Augustin and Bapst, Victor and Kohli, Pushmeet and Jaderberg, Max and Hassabis, Demis and Jumper, John M.}, title = {Accurate structure prediction of biomolecular interactions with AlphaFold 3}, journal = {Nature}, year = {2024}, month = may, day = {08}, issn = {1476-4687}, doi = {10.1038/s41586-024-07487-w}, url = {https://doi.org/10.1038/s41586-024-07487-w}, }

2023

Conformational states of the microtubule nucleator, the γ-tubulin ring complex

Brianna Romer, Sophie M. Travis, Brian P. Mahon, Collin T. McManus, Philip D. Jeffrey, Nicolas Coudray, Rishwanth Raghu, Michael J. Rale, Ellen D. Zhong, Gira Bhabha, and Sabine Petry

bioRxiv, 2023

Abs Bib HTML

Microtubules (MTs) perform essential functions in the cell, and it is critical that they are made at the correct cellular location and cell cycle stage. This nucleation process is catalyzed by the γ-tubulin ring complex (γ-TuRC), a cone-shaped protein complex composed of over 30 subunits. Despite recent insight into the structure of vertebrate γ-TuRC, which shows that its diameter is wider than that of a MT, and that it exhibits little of the symmetry expected for an ideal MT template, the question of how γ-TuRC achieves MT nucleation remains open. Here, we utilized single particle cryo-EM to identify two conformations of γ-TuRC. The helix composed of 14 γ-tubulins at the top of the γ-TuRC cone undergoes substantial deformation, which is predominantly driven by bending of the hinge between the GRIP1 and GRIP2 domains of the γ-tubulin complex proteins. However, surprisingly, this deformation does not remove the inherent asymmetry of γ-TuRC. To further investigate the role of γ-TuRC conformational change, we used cryo electron-tomography (cryo-ET) to obtain a 3D reconstruction of γ-TuRC bound to a nucleated MT, providing insight into the post-nucleation state. Rigid-body fitting of our cryo-EM structures into this reconstruction suggests that the MT lattice is nucleated by spokes 2 through 14 of the γ-tubulin helix, which entails spokes 13 and 14 becoming more structured than what is observed in apo γ-TuRC. Together, our results allow us to propose a model for conformational changes in γ-TuRC and how these may facilitate MT formation in a cell.Competing Interest StatementThe authors have declared no competing interest.
@article{Romer2023, author = {Romer, Brianna and Travis, Sophie M. and Mahon, Brian P. and McManus, Collin T. and Jeffrey, Philip D. and Coudray, Nicolas and Raghu, Rishwanth and Rale, Michael J. and Zhong, Ellen D. and Bhabha, Gira and Petry, Sabine}, title = {Conformational states of the microtubule nucleator, the γ-tubulin ring complex}, elocation-id = {2023.12.19.572162}, month = dec, year = {2023}, doi = {10.1101/2023.12.19.572162}, publisher = {Cold Spring Harbor Laboratory}, url = {https://www.biorxiv.org/content/early/2023/12/21/2023.12.19.572162}, journal = {bioRxiv} }
Time-resolved cryo-EM (TR-EM) analysis of substrate polyubiquitination by the RING E3 anaphase-promoting complex/cyclosome (APC/C)

Tatyana Bodrug, Kaeli A Welsh, Derek L Bolhuis, Ethan Paulаkonis, Raquel C Martinez-Chacin, Bei Liu, Nicholas Pinkin, Thomas Bonacci, Liying Cui, Pengning Xu, Olivia Roscow, Sascha Josef Amann, Irina Grishkovskaya, Michael J Emanuele, Joseph S Harrison, Joshua P Steimel, Klaus M Hahn, Wei Zhang, Ellen D Zhong, David Haselbach, and Nicholas G Brown

Nat. Struct. Mol. Biol., 2023

Abs Bib HTML

Substrate polyubiquitination drives a myriad of cellular processes, including the cell cycle, apoptosis and immune responses. Polyubiquitination is highly dynamic, and obtaining mechanistic insight has thus far required artificially trapped structures to stabilize specific steps along the enzymatic process. So far, how any ubiquitin ligase builds a proteasomal degradation signal, which is canonically regarded as four or more ubiquitins, remains unclear. Here we present time-resolved cryogenic electron microscopy studies of the 1.2 MDa E3 ubiquitin ligase, known as the anaphase-promoting complex/cyclosome (APC/C), and its E2 co-enzymes (UBE2C/UBCH10 and UBE2S) during substrate polyubiquitination. Using cryoDRGN (Deep Reconstructing Generative Networks), a neural network-based approach, we reconstruct the conformational changes undergone by the human APC/C during polyubiquitination, directly visualize an active E3–E2 pair modifying its substrate, and identify unexpected interactions between multiple ubiquitins with parts of the APC/C machinery, including its coactivator CDH1. Together, we demonstrate how modification of substrates with nascent ubiquitin chains helps to potentiate processive substrate polyubiquitination, allowing us to model how a ubiquitin ligase builds a proteasomal degradation signal. Here, using cryogenic electron microscopy and cryoDRGN, the authors delineate how the anaphase-promoting complex/cyclosome is reconfigurated to interact with its cognate E2s and thus polyubiquitinate its target. Unexpectedly, multiple ubiquitin moieties are shown to interact with the anaphase-promoting complex/cyclosome machinery, including its activator Cdh1.
@article{Bodrug2023-rr, title = {Time-resolved {cryo-EM} ({TR-EM}) analysis of substrate polyubiquitination by the {RING} {E3} anaphase-promoting complex/cyclosome ({APC/C})}, author = {Bodrug, Tatyana and Welsh, Kaeli A and Bolhuis, Derek L and Paulаkonis, Ethan and Martinez-Chacin, Raquel C and Liu, Bei and Pinkin, Nicholas and Bonacci, Thomas and Cui, Liying and Xu, Pengning and Roscow, Olivia and Amann, Sascha Josef and Grishkovskaya, Irina and Emanuele, Michael J and Harrison, Joseph S and Steimel, Joshua P and Hahn, Klaus M and Zhang, Wei and Zhong, Ellen D and Haselbach, David and Brown, Nicholas G}, journal = {Nat. Struct. Mol. Biol.}, publisher = {Nature Publishing Group}, pages = {1--12}, month = sep, year = {2023}, language = {en}, doi = {10.1038/s41594-023-01105-5}, }
Conformational heterogeneity and probability distributions from single-particle cryo-electron microscopy

Wai Shing Tang, Ellen D. Zhong, Sonya M. Hanson, Erik H. Thiede, and Pilar Cossio

Current Opinion in Structural Biology, 2023

Abs Bib HTML

Single-particle cryo-electron microscopy (cryo-EM) is a technique that takes projection images of biomolecules frozen at cryogenic temperatures. A major advantage of this technique is its ability to image single biomolecules in heterogeneous conformations. While this poses a challenge for data analysis, recent algorithmic advances have enabled the recovery of heterogeneous conformations from the noisy imaging data. Here, we review methods for the reconstruction and heterogeneity analysis of cryo-EM images, ranging from linear-transformation-based methods to nonlinear deep generative models. We overview the dimensionality-reduction techniques used in heterogeneous 3D reconstruction methods and specify what information each method can infer from the data. Then, we review the methods that use cryo-EM images to estimate probability distributions over conformations in reduced subspaces or predefined by atomistic simulations. We conclude with the ongoing challenges for the cryo-EM community.
@article{TANG2023102626, title = {Conformational heterogeneity and probability distributions from single-particle cryo-electron microscopy}, journal = {Current Opinion in Structural Biology}, volume = {81}, pages = {102626}, month = jan, year = {2023}, issn = {0959-440X}, doi = {https://doi.org/10.1016/j.sbi.2023.102626}, url = {https://www.sciencedirect.com/science/article/pii/S0959440X23001008}, author = {Tang, Wai Shing and Zhong, Ellen D. and Hanson, Sonya M. and Thiede, Erik H. and Cossio, Pilar}, }

2022

Amortized Inference for Heterogeneous Reconstruction in Cryo-EM

Axel Levy, Gordon Wetzstein, Julien Martel, Frederic Poitevin, and Ellen D Zhong

In Neural Information Processing Systems (NeurIPS), 2022

Abs arXiv Bib Website

Cryo-electron microscopy (cryo-EM) is an imaging modality that provides unique insights into the dynamics of proteins and other building blocks of life. The algorithmic challenge of jointly estimating the poses, 3D structure, and conformational heterogeneity of a biomolecule from millions of noisy and randomly oriented 2D projections in a computationally efficient manner, however, remains unsolved. Our method, cryoFIRE, performs ab initio heterogeneous reconstruction with unknown poses in an amortized framework, thereby avoiding the computationally expensive step of pose search while enabling the analysis of conformational heterogeneity. Poses and conformation are jointly estimated by an encoder while a physics-based decoder aggregates the images into an implicit neural representation of the conformational space. We show that our method can provide one order of magnitude speedup on datasets containing millions of images without any loss of accuracy. We validate that the joint estimation of poses and conformations can be amortized over the size of the dataset. For the first time, we prove that an amortized method can extract interpretable dynamic information from experimental datasets.
@inproceedings{levy2022amortized, title = {Amortized Inference for Heterogeneous Reconstruction in Cryo-EM}, author = {Levy, Axel and Wetzstein, Gordon and Martel, Julien and Poitevin, Frederic and Zhong, Ellen D}, booktitle = {Neural Information Processing Systems (NeurIPS)}, month = dec, year = {2022}, }
Latent Space Diffusion Models of Cryo-EM Structures

Karsten Kreis^*, Tim Dockhorn^*, Zihao Li, and Ellen D Zhong

In NeurIPS Workshop on Machine Learning for Structural Biology (MLSB), 2022

Oral presentation

Abs arXiv Bib

Cryo-electron microscopy (cryo-EM) is unique among tools in structural biology in its ability to image large, dynamic protein complexes. Key to this ability is image processing algorithms for heterogeneous cryo-EM reconstruction, including recent deep learning-based approaches. The state-of-the-art method cryoDRGN uses a Variational Autoencoder (VAE) framework to learn a continuous distribution of protein structures from single particle cryo-EM imaging data. While cryoDRGN can model complex structural motions, the Gaussian prior distribution of the VAE fails to match the aggregate approximate posterior, which prevents generative sampling of structures especially for multi-modal distributions (e.g. compositional heterogeneity). Here, we train a diffusion model as an expressive, learnable prior in the cryoDRGN framework. Our approach learns a high-quality generative model over molecular conformations directly from cryo-EM imaging data. We show the ability to sample from the model on two synthetic and two real datasets, where samples accurately follow the data distribution unlike samples from the VAE prior distribution. We also demonstrate how the diffusion model prior can be leveraged for fast latent space traversal and interpolation between states of interest. By learning an accurate model of the data distribution, our method unlocks tools in generative modeling, sampling, and distribution analysis for heterogeneous cryo-EM ensembles.
@inproceedings{kreis2022latent, title = {Latent Space Diffusion Models of Cryo-EM Structures}, author = {Kreis*, Karsten and Dockhorn*, Tim and Li, Zihao and Zhong, Ellen D}, booktitle = {NeurIPS Workshop on Machine Learning for Structural Biology (MLSB)}, month = dec, year = {2022}, note = {Oral presentation}, }
Deep generative modeling for volume reconstruction in cryo-electron microscopy

Claire Donnat, Axel Levy, Frederic Poitevin, Ellen D Zhong, and Nina Miolane

Journal of Structural Biology, 2022

Abs Bib HTML

Advances in cryo-electron microscopy (cryo-EM) for high-resolution imaging of biomolecules in solution have provided new challenges and opportunities for algorithm development for 3D reconstruction. Next-generation volume reconstruction algorithms that combine generative modelling with end-to-end unsupervised deep learning techniques have shown promise, but many technical and theoretical hurdles remain, especially when applied to experimental cryo-EM images. In light of the proliferation of such methods, we propose here a critical review of recent advances in the field of deep generative modelling for cryo-EM reconstruction. The present review aims to (i) provide a unified statistical framework using terminology familiar to machine learning researchers with no specific background in cryo-EM, (ii) review the current methods in this framework, and (iii) outline outstanding bottlenecks and avenues for improvements in the field.
@article{donnat2022deep, title = {Deep generative modeling for volume reconstruction in cryo-electron microscopy}, author = {Donnat, Claire and Levy, Axel and Poitevin, Frederic and Zhong, Ellen D and Miolane, Nina}, journal = {Journal of Structural Biology}, pages = {107920}, month = dec, year = {2022}, publisher = {Elsevier}, }
Machine Learning for Reconstructing Dynamic Protein Structures from Cryo-EM Images

Ellen D Zhong

Massachusetts Institute of Technology, 2022

Abs Bib PDF

Proteins and other biomolecules form dynamic macromolecular machines that carry out essential biological processes responsible for life. However, studying the mechanisms of these biomolecular complexes at relevant atomic-scale resolutions is an extraordinarily challenging task in structural biology. This thesis presents new algorithms that address the computational bottlenecks at the frontier of structure determination of dynamic biomolecular complexes via cryo-electron microscopy (cryo-EM).

In single particle cryo-EM, the central problem is to reconstruct the 3D structure of a target biomolecular complex from a set of noisy and randomly oriented 2D projection images, a challenging inverse problem especially when instances of the imaged biomolecular complex exhibit structural heterogeneity.

The main contribution of this thesis is a machine learning system, cryoDRGN, for reconstructing continuous distributions of biomolecular structures from cryo-EM images. Underpinning the cryoDRGN method is a deep generative model parameterized by a new neural representation of cryo-EM volumes and a learning algorithm to optimize this representation from unlabeled 2D cryo-EM images. Released as an open source software tool, cryoDRGN has been applied on real datasets to uncover heterogeneity in high resolution datasets, discover new conformations of large macromolecular machines and visualize continuous trajectories of their motion. This thesis also describes an extension, cryoDRGN2, for learning this model from unposed images, i.e. ab initio reconstruction. Finally, this thesis presents emerging directions in analyzing the learned manifold of cryo-EM structures and in incorporating atomic model priors into cryo-EM reconstruction.
@phdthesis{zhong2022machine, title = {Machine Learning for Reconstructing Dynamic Protein Structures from Cryo-EM Images}, author = {Zhong, Ellen D}, month = may, year = {2022}, school = {Massachusetts Institute of Technology}, }
Cryo-EM structure of the plant 26S proteasome

Susanne Kandolf, Irina Grishkovskaya, Katarina Belačić, Derek L Bolhuis, Sascha Amann, Brent Foster, Richard Imre, Karl Mechtler, Alexander Schleiffer, Hemant D Tagare, Ellen D Zhong, Anton Meinhard, Nicholas G Brown, and David Haselbach

Plant Communications, 2022

Abs Bib HTML

Targeted proteolysis is a hallmark of life. It is especially important in long-lived cells that can be found in higher eukaryotes, like plants. This task is mainly fulfilled by the ubiquitin–proteasome system. Thus, proteolysis by the 26S proteasome is vital to development, immunity, and cell division. Although the yeast and animal proteasomes are well characterized, there is only limited information on the plant proteasome. We determined the first plant 26S proteasome structure from Spinacia oleracea by single-particle electron cryogenic microscopy at an overall resolution of 3.3 A˚ . We found an almost identical overall architecture of the spinach proteasome compared with the known structures from mammals and yeast. Nevertheless, we noticed a structural difference in the proteolytic active b1 subunit. Furthermore, we uncovered an unseen compression state by characterizing the proteasome’s conformational landscape. We suspect that this new conformation of the 20S core protease, in correlation with a partial opening of the unoccupied gate, may contribute to peptide release after proteolysis. Our data provide a structural basis for the plant proteasome, which is crucial for further studies.
@article{kandolf2022cryo, title = {Cryo-EM structure of the plant 26S proteasome}, author = {Kandolf, Susanne and Grishkovskaya, Irina and Bela{\v{c}}i{\'c}, Katarina and Bolhuis, Derek L and Amann, Sascha and Foster, Brent and Imre, Richard and Mechtler, Karl and Schleiffer, Alexander and Tagare, Hemant D and Zhong, Ellen D and Meinhard, Anton and Brown, Nicholas G and Haselbach, David}, journal = {Plant Communications}, volume = {3}, number = {3}, pages = {100310}, year = {2022}, publisher = {Elsevier}, }
Conformational landscape of the yeast SAGA complex as revealed by cryo-EM

Diana Vasyliuk, Joeseph Felt, Ellen D Zhong, Bonnie Berger, Joseph H Davis, and Calvin K Yip

Scientific Reports, 2022

Abs Bib HTML

Spt-Ada-Gcn5-Acetyltransferase (SAGA) is a conserved multi-subunit complex that activates RNA polymerase II-mediated transcription by acetylating and deubiquitinating nucleosomal histones and by recruiting TATA box binding protein (TBP) to DNA. The prototypical yeast Saccharomyces cerevisiae SAGA contains 19 subunits that are organized into Tra1, core, histone acetyltransferase, and deubiquitination modules. Recent cryo-electron microscopy studies have generated high-resolution structural information on the Tra1 and core modules of yeast SAGA. However, the two catalytical modules were poorly resolved due to conformational flexibility of the full assembly. Furthermore, the high sample requirement created a formidable barrier to further structural investigations of SAGA. Here, we report a workflow for isolating/stabilizing yeast SAGA and preparing cryo-EM specimens at low protein concentration using a graphene oxide support layer. With this procedure, we were able to determine a cryo-EM reconstruction of yeast SAGA at 3.1 Å resolution and examine its conformational landscape with the neural network-based algorithm cryoDRGN. Our analysis revealed that SAGA adopts a range of conformations with its HAT module and central core in different orientations relative to Tra1.
@article{vasyliuk2022conformational, title = {Conformational landscape of the yeast SAGA complex as revealed by cryo-EM}, author = {Vasyliuk, Diana and Felt, Joeseph and Zhong, Ellen D and Berger, Bonnie and Davis, Joseph H and Yip, Calvin K}, journal = {Scientific Reports}, volume = {12}, number = {1}, pages = {1--9}, year = {2022}, publisher = {Nature Publishing Group}, }
Uncovering structural ensembles from single-particle cryo-EM data using cryoDRGN

Laurel F Kinman^*, Barrett M Powell^*, Ellen D Zhong^*⁺, Bonnie Berger⁺, and Joseph H Davis⁺

Nature Protocols, 2022

Abs Bib HTML Code

Single-particle cryogenic electron microscopy (cryo-EM) has emerged as a powerful technique to visualize the structural landscape sampled by a protein complex. However, algorithmic and computational bottlenecks in analyzing heterogeneous cryo-EM datasets have prevented the full realization of this potential. CryoDRGN is a machine learning system for heterogeneous cryo-EM reconstruction of proteins and protein complexes from single-particle cryo-EM data. Central to this approach is a deep generative model for heterogeneous cryo-EM density maps, which we empirically find is effective in modeling both discrete and continuous forms of structural variability. Once trained, cryoDRGN is capable of generating an arbitrary number of 3D density maps, and thus interpreting the resulting ensemble is a challenge. Here, we showcase interactive and automated processing approaches for analyzing cryoDRGN results. Specifically, we detail a step-by-step protocol for the analysis of an existing assembling 50S ribosome dataset, including preparation of inputs, network training and visualization of the resulting ensemble of density maps. Additionally, we describe and implement methods to comprehensively analyze and interpret the distribution of volumes with the assistance of an associated atomic model. This protocol is appropriate for structural biologists familiar with processing single-particle cryo-EM datasets and with moderate experience navigating Python and Jupyter notebooks. It requires 3–4 days to complete. CryoDRGN is open source software that is freely available.
@article{kinman2022uncovering, title = {Uncovering structural ensembles from single-particle cryo-EM data using cryoDRGN}, author = {Kinman*, Laurel F and Powell*, Barrett M and Zhong*+, Ellen D and Berger+, Bonnie and Davis+, Joseph H}, journal = {Nature Protocols}, pages = {1--31}, year = {2022}, publisher = {Nature Publishing Group}, }

2021

CryoDRGN2: Ab Initio Neural Reconstruction of 3D Protein Structures From Real Cryo-EM Images

Ellen D Zhong, Adam Lerer, Joseph H Davis, and Bonnie Berger

In International Conference on Computer Vision (ICCV), 2021

Abs Bib HTML Code

Protein structure determination from cryo-EM data requires reconstructing a 3D volume (or distribution of volumes) from many noisy and randomly oriented 2D projection images. While the standard homogeneous reconstruction task aims to recover a single static structure, recently-proposed neural and non-neural methods can reconstruct distributions of structures, thereby enabling the study of protein complexes that possess intrinsic structural or conformational heterogeneity. These heterogeneous reconstruction methods, however, require fixed image poses, which are typically estimated from an upstream homogeneous reconstruction and are not guaranteed to be accurate under highly heterogeneous conditions. In this work we describe cryoDRGN2, an ab initio reconstruction algorithm, which can jointly estimate image poses and learn a neural model of a distribution of 3D structures on real heterogeneous cryo-EM data. To achieve this, we adapt search algorithms from the traditional cryo-EM literature, and describe the optimizations and design choices required to make such a search procedure computationally tractable in the neural model setting. We show that cryoDRGN2 is robust to the high noise levels of real cryo-EM images, trains faster than earlier neural methods, and achieves state-of-the-art performance on real cryo-EM datasets.
@inproceedings{zhong2021cryodrgn2, title = {CryoDRGN2: Ab Initio Neural Reconstruction of 3D Protein Structures From Real Cryo-EM Images}, author = {Zhong, Ellen D and Lerer, Adam and Davis, Joseph H and Berger, Bonnie}, booktitle = {International Conference on Computer Vision (ICCV)}, pages = {4066--4075}, month = may, year = {2021}, }
CryoDRGN: reconstruction of heterogeneous cryo-EM structures using neural networks

Ellen D Zhong, Tristan Bepler, Bonnie Berger⁺, and Joseph H Davis⁺

Nature Methods, 2021

Abs Bib PDF Code Website

Cryo-electron microscopy (cryo-EM) single-particle analysis has proven powerful in determining the structures of rigid macromolecules. However, many imaged protein complexes exhibit conformational and compositional heterogeneity that poses a major challenge to existing three-dimensional reconstruction methods. Here, we present cryoDRGN, an algorithm that leverages the representation power of deep neural networks to directly reconstruct continuous distributions of 3D density maps and map per-particle heterogeneity of single-particle cryo-EM datasets. Using cryoDRGN, we uncovered residual heterogeneity in high-resolution datasets of the 80S ribosome and the RAG complex, revealed a new structural state of the assembling 50S ribosome, and visualized large-scale continuous motions of a spliceosome complex. CryoDRGN contains interactive tools to visualize a dataset’s distribution of per-particle variability, generate density maps for exploratory analysis, extract particle subsets for use with other tools and generate trajectories to visualize molecular motions. CryoDRGN is open-source software freely available at http://cryodrgn.csail.mit.edu.
@article{zhong2021cryodrgn, title = {CryoDRGN: reconstruction of heterogeneous cryo-EM structures using neural networks}, author = {Zhong, Ellen D and Bepler, Tristan and Berger+, Bonnie and Davis+, Joseph H}, journal = {Nature Methods}, volume = {18}, number = {2}, pages = {176--185}, month = feb, year = {2021}, publisher = {Nature Publishing Group}, doi = {10.1038/s41592-020-01049-4}, }
Structures of radial spokes and associated complexes important for ciliary motility

Miao Gui, Meisheng Ma, Erica Sze-Tu, Xiangli Wang, Fujiet Koh, Ellen D Zhong, Bonnie Berger, Joseph H Davis, Susan K Dutcher, Rui Zhang⁺, and Alan Brown⁺

Nature Structural & Molecular Biology, 2021

Abs Bib HTML PDF

In motile cilia, a mechanoregulatory network is responsible for converting the action of thousands of dynein motors bound to doublet microtubules into a single propulsive waveform. Here, we use two complementary cryo-EM strategies to determine structures of the major mechanoregulators that bind ciliary doublet microtubules in Chlamydomonas reinhardtii. We determine structures of isolated radial spoke RS1 and the microtubule-bound RS1, RS2 and the nexin−dynein regulatory complex (N-DRC). From these structures, we identify and build atomic models for 30 proteins, including 23 radial-spoke subunits. We reveal how mechanoregulatory complexes dock to doublet microtubules with regular 96-nm periodicity and communicate with one another. Additionally, we observe a direct and dynamically coupled association between RS2 and the dynein motor inner dynein arm subform c (IDAc), providing a molecular basis for the control of motor activity by mechanical signals. These structures advance our understanding of the role of mechanoregulation in defining the ciliary waveform.
@article{gui2020structures, title = {Structures of radial spokes and associated complexes important for ciliary motility}, author = {Gui, Miao and Ma, Meisheng and Sze-Tu, Erica and Wang, Xiangli and Koh, Fujiet and Zhong, Ellen D and Berger, Bonnie and Davis, Joseph H and Dutcher, Susan K and Zhang+, Rui and Brown+, Alan}, journal = {Nature Structural \& Molecular Biology}, year = {2021}, month = jan, }
Learning the language of viral evolution and escape

Brian Hie, Ellen D Zhong, Bonnie Berger⁺, and Bryan Bryson⁺

Science, 2021

Abs Bib HTML PDF

The ability for viruses to mutate and evade the human immune system and cause infection, called viral escape, remains an obstacle to antiviral and vaccine development. Understanding the complex rules that govern escape could inform therapeutic design. We modeled viral escape with machine learning algorithms originally developed for human natural language. We identified escape mutations as those that preserve viral infectivity but cause a virus to look different to the immune system, akin to word changes that preserve a sentence’s grammaticality but change its meaning. With this approach, language models of influenza hemagglutinin, HIV-1 envelope glycoprotein (HIV Env), and severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) Spike viral proteins can accurately predict structural escape patterns using sequence data alone. Our study represents a promising conceptual bridge between natural language and viral evolution.
@article{hie2021learning, title = {Learning the language of viral evolution and escape}, author = {Hie, Brian and Zhong, Ellen D and Berger+, Bonnie and Bryson+, Bryan}, journal = {Science}, volume = {371}, number = {6526}, pages = {284--288}, year = {2021}, publisher = {American Association for the Advancement of Science}, doi = {10.1126/science.abd7331}, }

2020

Learning mutational semantics

Brian Hie, Ellen D Zhong, Bryan Bryson, and Bonnie Berger

In Neural Information Processing Systems (NeurIPS), 2020

Abs Bib HTML PDF Code

In many natural domains, changing a small part of an entity can transform its semantics; for example, a single word change can alter the meaning of a sentence, or a single amino acid change can mutate a viral protein to escape antiviral treatment or immunity. Although identifying such mutations can be desirable (for example, therapeutic design that anticipates avenues of viral escape), the rules governing semantic change are often hard to quantify. Here, we introduce the problem of identifying mutations with a large effect on semantics, but where valid mutations are under complex constraints (for example, English grammar or biological viability), which we refer to as constrained semantic change search (CSCS). We propose an unsupervised solution based on language models that simultaneously learn continuous latent representations. We report good empirical performance on CSCS of single-word mutations to news headlines, map a continuous semantic space of viral variation, and, notably, show unprecedented zero-shot prediction of single-residue escape mutations to key influenza and HIV proteins, suggesting a productive link between modeling natural language and pathogenic evolution.
@inproceedings{hie2020learning, title = {Learning mutational semantics}, author = {Hie, Brian and Zhong, Ellen D and Bryson, Bryan and Berger, Bonnie}, booktitle = {Neural Information Processing Systems (NeurIPS)}, volume = {33}, pages = {9109--9121}, month = dec, year = {2020}, }
Exploring generative atomic models in cryo-EM reconstruction

Ellen D Zhong, Adam Lerer, Joseph H Davis, and Bonnie Berger

In NeurIPS Workshop on Machine Learning for Structural Biology (MLSB), 2020

Abs arXiv Bib

Cryo-EM reconstruction algorithms seek to determine a molecule’s 3D density map from a series of noisy, unlabeled 2D projection images captured with an electron microscope. Although reconstruction algorithms typically model the 3D volume as a generic function parameterized as a voxel array or neural network, the underlying atomic structure of the protein of interest places well-defined physical constraints on the reconstructed structure. In this work, we exploit prior information provided by an atomic model to reconstruct distributions of 3D structures from a cryo-EM dataset. We propose Cryofold, a generative model for a continuous distribution of 3D volumes based on a coarse-grained model of the protein’s atomic structure, with radial basis functions used to model atom locations and their physics-based constraints. Although the reconstruction objective is highly non-convex when formulated in terms of atomic coordinates (similar to the protein folding problem), we show that gradient descent-based methods can reconstruct a continuous distribution of atomic structures when initialized from a structure within the underlying distribution. This approach is a promising direction for integrating biophysical simulation, learned neural models, and experimental data for 3D protein structure determination.
@inproceedings{zhong2021exploring, title = {Exploring generative atomic models in cryo-EM reconstruction}, author = {Zhong, Ellen D and Lerer, Adam and Davis, Joseph H and Berger, Bonnie}, booktitle = {NeurIPS Workshop on Machine Learning for Structural Biology (MLSB)}, journal = {arXiv preprint arXiv:2107.01331}, month = dec, year = {2020}, }
RNA timestamps identify the age of single molecules in RNA sequencing

Samuel G Rodriques, Linlin M Chen, Sophia Liu, Ellen D Zhong, Joseph R Scherrer, Edward S Boyden⁺, and Fei Chen⁺

Nature Biotechnology, 2020

Abs Bib HTML PDF

Current approaches to single-cell RNA sequencing (RNA-seq) provide only limited information about the dynamics of gene expression. Here we present RNA timestamps, a method for inferring the age of individual RNAs in RNA-seq data by exploiting RNA editing. To introduce timestamps, we tag RNA with a reporter motif consisting of multiple MS2 binding sites that recruit the adenosine deaminase ADAR2 fused to an MS2 capsid protein. ADAR2 binding to tagged RNA causes A-to-I edits to accumulate over time, allowing the age of the RNA to be inferred with hour-scale accuracy. By combining observations of multiple timestamped RNAs driven by the same promoter, we can determine when the promoter was active. We demonstrate that the system can infer the presence and timing of multiple past transcriptional events. Finally, we apply the method to cluster single cells according to the timing of past transcriptional activity. RNA timestamps will allow the incorporation of temporal information into RNA-seq workflows.
@article{rodriques2020rna, title = {RNA timestamps identify the age of single molecules in RNA sequencing}, author = {Rodriques, Samuel G and Chen, Linlin M and Liu, Sophia and Zhong, Ellen D and Scherrer, Joseph R and Boyden+, Edward S and Chen+, Fei}, journal = {Nature Biotechnology}, pages = {1--6}, year = {2020}, month = oct, publisher = {Nature Publishing Group}, }
Reconstructing continuous distributions of 3D protein structure from cryo-EM images.

Ellen D Zhong, Tristan Bepler, Joseph H Davis, and Bonnie Berger

In International Conference on Learning Representations (ICLR), 2020

Spotlight presentation

Abs arXiv Bib Code Website

Cryo-electron microscopy (cryo-EM) is a powerful technique for determining the structure of proteins and other macromolecular complexes at near-atomic resolution. In single particle cryo-EM, the central problem is to reconstruct the three-dimensional structure of a macromolecule from 10⁴⁻⁷ noisy and randomly oriented two-dimensional projections. However, the imaged protein complexes may exhibit structural variability, which complicates reconstruction and is typically addressed using discrete clustering approaches that fail to capture the full range of protein dynamics. Here, we introduce a novel method for cryo-EM reconstruction that extends naturally to modeling continuous generative factors of structural heterogeneity. This method encodes structures in Fourier space using coordinate-based deep neural networks, and trains these networks from unlabeled 2D cryo-EM images by combining exact inference over image orientation with variational inference for structural heterogeneity. We demonstrate that the proposed method, termed cryoDRGN, can perform ab initio reconstruction of 3D protein complexes from simulated and real 2D cryo-EM image data. To our knowledge, cryoDRGN is the first neural network-based approach for cryo-EM reconstruction and the first end-to-end method for directly reconstructing continuous ensembles of protein structures from cryo-EM images.
@inproceedings{zhong2020reconstructing, title = {Reconstructing continuous distributions of 3D protein structure from cryo-EM images.}, author = {Zhong, Ellen D and Bepler, Tristan and Davis, Joseph H and Berger, Bonnie}, booktitle = {International Conference on Learning Representations (ICLR)}, year = {2020}, month = may, note = {Spotlight presentation}, }

2019

Explicitly disentangling image content from translation and rotation with spatial-VAE

Tristan Bepler, Ellen D Zhong, Kotaro Kelley, Edward Brignole, and Bonnie Berger

In Neural Information Processing Systems (NeurIPS), 2019

Abs arXiv Bib

Given an image dataset, we are often interested in finding data generative factors that encode semantic content independently from pose variables such as rotation and translation. However, current disentanglement approaches do not impose any specific structure on the learned latent representations. We propose a method for explicitly disentangling image rotation and translation from other unstructured latent factors in a variational autoencoder (VAE) framework. By formulating the generative model as a function of the spatial coordinate, we make the reconstruction error differentiable with respect to latent translation and rotation parameters. This formulation allows us to train a neural network to perform approximate inference on these latent variables while explicitly constraining them to only represent rotation and translation. We demonstrate that this framework, termed spatial-VAE, effectively learns latent representations that disentangle image rotation and translation from content and improves reconstruction over standard VAEs on several benchmark datasets, including applications to modeling continuous 2-D views of proteins from single particle electron microscopy and galaxies in astronomical images.
@inproceedings{bepler2019explicitly, title = {Explicitly disentangling image content from translation and rotation with spatial-VAE}, author = {Bepler, Tristan and Zhong, Ellen D and Kelley, Kotaro and Brignole, Edward and Berger, Bonnie}, booktitle = {Neural Information Processing Systems (NeurIPS)}, pages = {15435--15445}, year = {2019}, }

2017

Lessons learned from comparing molecular dynamics engines on the SAMPL5 dataset

Michael R Shirts, Christoph Klein, Jason M Swails, Jian Yin, Michael K Gilson, David L Mobley, David A Case, and Ellen D Zhong

Journal of Computer-Aided Molecular Design, 2017

Abs Bib HTML

We describe our efforts to prepare common starting structures and models for the SAMPL5 blind prediction challenge. We generated the starting input files and single configuration potential energies for the host-guest in the SAMPL5 blind prediction challenge for the GROMACS, AMBER, LAMMPS, DESMOND and CHARMM molecular simulation programs. All conversions were fully automated from the originally prepared AMBER input files using a combination of the ParmEd and InterMol conversion programs. We find that the energy calculations for all molecular dynamics engines for this molecular set agree to better than 0.1 % relative absolute energy for all energy components, and in most cases an order of magnitude better, when reasonable choices are made for different cutoff parameters. However, there are some surprising sources of statistically significant differences. Most importantly, different choices of Coulomb’s constant between programs are one of the largest sources of discrepancies in energies. We discuss the measures required to get good agreement in the energies for equivalent starting configurations between the simulation programs, and the energy differences that occur when simulations are run with program-specific default simulation parameter values. Finally, we discuss what was required to automate this conversion and comparison.
@article{shirts2017lessons, title = {Lessons learned from comparing molecular dynamics engines on the SAMPL5 dataset}, author = {Shirts, Michael R and Klein, Christoph and Swails, Jason M and Yin, Jian and Gilson, Michael K and Mobley, David L and Case, David A and Zhong, Ellen D}, journal = {Journal of Computer-Aided Molecular Design}, volume = {31}, number = {1}, pages = {147--161}, year = {2017}, publisher = {Springer}, }

2014

Thermodynamics of coupled protein adsorption and stability using hybrid Monte Carlo simulations

Ellen D Zhong, and Michael R Shirts

Langmuir, 2014

Abs Bib HTML

A better understanding of changes in protein stability upon adsorption can improve the design of protein separation processes. In this study, we examine the coupling of the folding and the adsorption of a model protein, the B1 domain of streptococcal protein G, as a function of surface attraction using a hybrid Monte Carlo (HMC) approach with temperature replica exchange and umbrella sampling. In our HMC implementation, we are able to use a molecular dynamics (MD) time step that is an order of magnitude larger than in a traditional MD simulation protocol and observe a factor of 2 enhancement in the folding and unfolding rate. To demonstrate the convergence of our systems, we measure the travel of our order parameter the fraction of native contacts between folded and unfolded states throughout the length of our simulations. Thermodynamic quantities are extracted with minimum statistical variance using multistate reweighting between simulations at different temperatures and harmonic distance restraints from the surface. The resultant free energies, enthalpies, and entropies of the coupled unfolding and absorption processes are in qualitative agreement with previous experimental and computational observations, including entropic stabilization of the adsorbed, folded state relative to the bulk on surfaces with low attraction.
@article{zhong2014thermodynamics, title = {Thermodynamics of coupled protein adsorption and stability using hybrid Monte Carlo simulations}, author = {Zhong, Ellen D and Shirts, Michael R}, journal = {Langmuir}, volume = {30}, number = {17}, pages = {4952--4961}, year = {2014}, publisher = {ACS Publications}, }

2012

Areas of permanent shadow in Mercury’s south polar region ascertained by MESSENGER orbital imaging

Nancy L Chabot, Carolyn M Ernst, Brett W Denevi, John K Harmon, Scott L Murchie, David T Blewett, Sean C Solomon, and Ellen D Zhong

Geophysical Research Letters, 2012

Abs Bib HTML

Radar-bright features near Mercury’s poles have been postulated to be deposits of water ice trapped in cold, permanently shadowed interiors of impact craters. From its orbit about Mercury, MESSENGER repeatedly imaged the planet’s south polar region over one Mercury solar day, providing a complete view of the terrain near the south pole and enabling the identification of areas of permanent shadow larger in horizontal extent than approximately 4 km. In Mercury’s south polar region, all radar-bright features correspond to areas of permanent shadow. Application of previous thermal models suggests that the radar-bright deposits in Mercury’s south polar cold traps are in locations consistent with a composition dominated by water ice provided that some manner of insulation, such as a thin layer of regolith, covers many of the deposits.
@article{chabot2012areas, title = {Areas of permanent shadow in Mercury's south polar region ascertained by MESSENGER orbital imaging}, author = {Chabot, Nancy L and Ernst, Carolyn M and Denevi, Brett W and Harmon, John K and Murchie, Scott L and Blewett, David T and Solomon, Sean C and Zhong, Ellen D}, journal = {Geophysical Research Letters}, volume = {39}, number = {9}, year = {2012}, publisher = {Wiley Online Library}, }