Behkamal, Bahareh / Naghibzadeh, Mahmoud / Pagnani, Andrea / Saberi, Mohammad Reza / Al Nasr, Kamal. LPTD: A Novel Linear Programming-based Topology Determination Method for Cryo-EM Maps. 2022. Bioinformatics.
Topology determination is one of the most important intermediate steps towards building the atomic structure of proteins from their medium-resolution cryo-electron microscopy (cryo-EM) map. The main goal in the topology determination is to identify correct matches (i.e. assignment and direction) between secondary structure elements (α-helices and β-sheets) detected in a protein sequence and cryo-EM density map. Despite many recent advances in molecular biology technologies, the problem remains a challenging issue. To overcome the problem, this article proposes a Linear Programming-based Topology Determination method (LPTD) to solve the secondary structure topology problem in three-dimensional geometrical space. Through modeling of the protein's sequence with the aid of extracting highly reliable features and a distance-based scoring function, the secondary structure matching problem is transformed into a complete weighted bipartite graph matching problem. Subsequently, an algorithm based on linear programming is developed as a decision-making strategy to extract the true topology (native topology) between all possible topologies. The proposed automatic framework is verified using 12 experimental and 15 simulated α-β proteins. Results demonstrate that LPTD is highly efficient and extremely fast in such a way that for 77% of cases in the data set, the native topology has been detected in the first rank topology in less than 2 seconds. Besides, this method is able to successfully handle large complex proteins with as many as 65 secondary structure elements. Such a large number of secondary structure elements have never been solved with current tools/methods. The LPTD package (source code and data) is publicly available at https://github.com/B-Behkamal/LPTD. Moreover, two test samples as well as the instruction of utilizing the graphical user interface (GUI) have been provided in the shared readme file. Supplementary data will be available at Bioinformatics online.