2025Mondal EMSequenceFinder

From 3DEM-Methods
Jump to navigation Jump to search

Citation

Mondal, D., Kumar, V., Satler, T., Ramachandran, R., Saltzberg, D., Chemmama, I., Pilla, K.B., Echeverria, I., Webb, B.M., Gupta, M. and others 2025. Recognizing amino acid sidechains in a medium-resolution cryo-electron density map. Protein Science. 34, 8 (2025), e70217.

Abstract

Building an accurate atomic structure model of a protein into a cryo-electron microscopy (cryo-EM) map at worse than 3 Å resolution is difficult. To facilitate this task, we devised a method for assigning the amino acid residue sequence to the backbone fragments traced in an input cryo-EM map (EMSequenceFinder). EMSequenceFinder relies on a Bayesian scoring function for ranking 20 standard amino acid residue types at a given backbone position, based on the fit to a density map, map resolution, and secondary structure propensity. The fit to a density is quantified by a convolutional neural network that was trained on ~5.56 million amino acid residue densities extracted from cryo-EM maps at 3–10 Å resolution and corresponding atomic structure models deposited in the Electron Microscopy Data Bank (EMDB). We benchmarked EMSequenceFinder by predicting the sequences of 58,044 distinct ɑ-helix and β-strand fragments, given the fragment backbone coordinates fitted in their density maps. EMSequenceFinder identifies the correct sequence as the best-scoring sequence in 77.8% of these cases. We also assessed EMSequenceFinder on separate datasets of cryo-EM maps at resolutions from 4 to 6 Å. The accuracy of EMSequenceFinder (58%) was better than that of three tested state-of-the-art methods, including findMysequence (45%), ModelAngelo (27%), and sequence_from_map in Phenix (12.9%). We further illustrate EMSequenceFinder by threading the Severe Acute Respiratory Syndrome Coronavirus 2 Non-Structural Protein 2 sequence into eight cryo-EM maps at resolutions from 3.7 to 7.0 Å. EMSequenceFinder is implemented in our open-source Integrative Modeling Platform (IMP) program. Thus, it is expected to be helpful for integrative structure modeling based on a cryo-EM map and other information, such as models of protein complex components and chemical crosslinks between them. EMSequenceFinder is available as part of our open-source IMP distribution at https://integrativemodeling.org/.

Keywords

https://onlinelibrary.wiley.com/doi/full/10.1002/pro.70217

Comments