2025Fu T2Relion

From 3DEM-Methods
Revision as of 11:07, 21 November 2025 by WikiSysop (talk | contribs) (Created page with "== Citation == Fu, J., Xu, J., Gan, L., Mao, T., Shen, Z., Wang, Y., Song, Z., Duan, X., Xue, W. and Yang, G. 2025. T2-RELION: Task Parallelism, Tensor Core Accelerated RELION for Cryo-EM 3D Reconstruction. Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (2025), 2186–2202. == Abstract == Cryo-electron microscopy (cryo-EM) is a key technique for structural biology, but its computational efficiency, particul...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Citation

Fu, J., Xu, J., Gan, L., Mao, T., Shen, Z., Wang, Y., Song, Z., Duan, X., Xue, W. and Yang, G. 2025. T2-RELION: Task Parallelism, Tensor Core Accelerated RELION for Cryo-EM 3D Reconstruction. Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (2025), 2186–2202.

Abstract

Cryo-electron microscopy (cryo-EM) is a key technique for structural biology, but its computational efficiency, particularly during 3D reconstruction, remains a bottleneck.We introduce T2-RELION, a highly optimized version of RELION for cryo-EM 3D reconstruction on CPU-GPU platforms. RELION is a widely used open-source package in the cryo-EM community. We identify and resolve key inefficiencies in RELION’s parallelization strategy and memory management by proposing task parallelism and a three-phase GPU memory management strategy. Furthermore, we leverage Tensor Cores to accelerate the hot-spot kernel for difference calculation, employing an advanced pipelining strategy to hide latency and enable thread-block-level data reuse. On a quad-A100 GPU machine, performance evaluations demonstrate that T2-RELION outperforms RELION 4.0. For the hot-spot kernel, our optimizations achieve 1.90- 23.7 times speedup. For the whole application using CNG and Trpv1 datasets, we observe 3.86 times and 2.68 times speedups, respectively.

Keywords

https://dl.acm.org/doi/full/10.1145/3712285.3759824

Comments