2023Richardson Overfitting

From 3DEM-Methods
Revision as of 08:03, 15 July 2024 by WikiSysop (talk | contribs) (Created page with "== Citation == Richardson, Jane S. / Williams, Christopher J. / Chen, Vincent B. / Prisant, Michael G. / Richardson, David C. The bad and the good of trends in model building and refinement for sparse-data regions: pernicious forms of overfitting versus good new tools and predictions. 2023. Acta Crystallographica Section D: Structural Biology, Vol. 79, No. 12 == Abstract == Model building and refinement, and the validation of their correctness, are very effective and...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Citation

Richardson, Jane S. / Williams, Christopher J. / Chen, Vincent B. / Prisant, Michael G. / Richardson, David C. The bad and the good of trends in model building and refinement for sparse-data regions: pernicious forms of overfitting versus good new tools and predictions. 2023. Acta Crystallographica Section D: Structural Biology, Vol. 79, No. 12

Abstract

Model building and refinement, and the validation of their correctness, are very effective and reliable at local resolutions better than about 2.5 A ˚ for both crystallography and cryo-EM. However, at local resolutions worse than 2.5 A ˚ both the procedures and their validation break down and do not ensure reliably correct models. This is because in the broad density at lower resolution, critical features such as protein backbone carbonyl O atoms are not just less accurate but are not seen at all, and so peptide orientations are frequently wrongly fitted by 90–180 degrees. This puts both backbone and side chains into the wrong local energy minimum, and they are then worsened rather than improved by further refinement into a valid but incorrect rotamer or Ramachandran region. On the positive side, new tools are being developed to locate this type of pernicious error in PDB depositions, such as CaBLAM, EMRinger, Pperp diagnosis of ribose puckers, and peptide flips in PDB-REDO, while interactive modeling in Coot or ISOLDE can help to fix many of them. Another positive trend is that artificial intelligence predictions such as those made by AlphaFold2 contribute additional evidence from large multiple sequence alignments, and in highconfidence parts they provide quite good starting models for loops, termini or whole domains with otherwise ambiguous density.

Keywords

Links

https://journals.iucr.org/d/issues/2023/12/00/qo5006/qo5006.pdf

Related software

Related methods

Comments