Authors:
(1) JunJie Wee, Department of Mathematics, Michigan State University;
(2) Jiahui Chen, Department of Mathematical Sciences, University of Arkansas;
(3) Kelin Xia, Division of Mathematical Sciences, School of Physical and Mathematical Sciences Nanyang Technological University & xiakelin@ntu.edu.sg;
(4)Guo-Wei Wei, Department of Mathematics, Michigan State University, Department of Biochemistry and Molecular Biology, Michigan State University, Department of Electrical and Computer Engineering, Michigan State University & xiakelin@ntu.edu.sg.
Software and resources, Code and Data Availability
Supporting Information, Acknowledgments & References
Supporting Information is available for supplementary tables, figures, and methods.
This work was supported in part by NIH grants R01GM126189, R01AI164266, and R01AI146210, NSF grants DMS-2052983, DMS-1761320, and IIS-1900473, NASA grant 80NSSC21M0023, MSU Foundation, Bristol-Myers Squibb 65109, and Pfizer. It was supported in part by Nanyang Technological University Startup Grant M4081842.110, Singapore Ministry of Education Academic Research fund Tier 1 RG109/19 and Tier 2 MOE-T2EP20120-0013, MOE-T2EP20220- 0010, and MOE-T2EP20221-0003.
[1] Y. Qiu and G.-W. Wei, “Persistent spectral theory-guided protein engineering,” Nature Computational Science, vol. 3, no. 2, pp. 149–163, 2023.
[2] R. Guerois, J. E. Nielsen, and L. Serrano, “Predicting changes in the stability of proteins and protein complexes: a study of more than 1000 mutations,” Journal of molecular biology, vol. 320, no. 2, pp. 369–387, 2002.
[3] P. Sormanni, F. A. Aprile, and M. Vendruscolo, “The CamSol method of rational design of protein mutants with enhanced solubility,” Journal of molecular biology, vol. 427, no. 2, pp. 478–490, 2015.
[4] Y. Tian, C. Deutsch, and B. Krishnamoorthy, “Scoring function to predict solubility mutagenesis,” Algorithms for Molecular Biology, vol. 5, no. 1, pp. 1–11, 2010.
[5] Y. Yang, A. Niroula, B. Shen, and M. Vihinen, “PON-Sol: Prediction of effects of amino acid substitutions on protein solubility,” Bioinformatics, vol. 32, no. 13, pp. 2032–2034, 2016.
[6] L. Paladin, D. Piovesan, and S. C. Tosatto, “SODA: Prediction of protein solubility from disorder and aggregation propensity,” Nucleic acids research, vol. 45, no. W1, pp. W236– W240, 2017.
[7] J. Van Durme, G. De Baets, R. Van Der Kant, M. Ramakers, A. Ganesan, H. Wilkinson, R. Gallardo, F. Rousseau, and J. Schymkowitz, “Solubis: a webserver to reduce protein aggregation through mutation,” Protein Engineering, Design and Selection, vol. 29, no. 8, pp. 285–289, 2016.
[8] M. Vihinen, “Solubility of proteins,” ADMET and DMPK, vol. 8, no. 4, pp. 391–399, 2020.
[9] A.-M. Fernandez-Escamilla, F. Rousseau, J. Schymkowitz, and L. Serrano, “Prediction of sequence-dependent and mutational effects on the aggregation of peptides and proteins,” Nature Biotechnology, vol. 22, no. 10, pp. 1302–1306, 2004.
[10] H. Land and M. S. Humble, “YASARA: A tool to obtain structural guidance in biocatalytic investigations,” Protein engineering: methods and protocols, pp. 43–67, 2018.
[11] Y. Yang, L. Zeng, and M. Vihinen, “PON-Sol2: Prediction of effects of variants on protein solubility,” International Journal of Molecular Sciences, vol. 22, no. 15, p. 8027, 2021.
[12] H. Edelsbrunner and J. L. Harer, Computational Topology: An Introduction. American Mathematical Society, 2022.
[13] A. Zomorodian and G. Carlsson, “Computing Persistent Homology,” in Proceedings of the twentieth annual symposium on Computational geometry, pp. 347–356, 2004.
[14] K. Xia and G.-W. Wei, “Persistent homology analysis of protein structure, flexibility, and folding,” International journal for numerical methods in biomedical engineering, vol. 30, no. 8, pp. 814–844, 2014.
[15] Z. Cang, L. Mu, K. Wu, K. Opron, K. Xia, and G.-W. Wei, “A topological approach for protein classification,” Computational and Mathematical Biophysics, vol. 3, no. 1, 2015.
[16] Z. Cang and G.-W. Wei, “Topologynet: Topology based deep convolutional and multitask neural networks for biomolecular property predictions,” PLoS computational biology, vol. 13, no. 7, p. e1005690, 2017.
[17] Z. X. Cang and G. W. Wei, “Integration of element specific persistent homology and machine learning for protein-ligand binding affinity prediction,” International journal for numerical methods in biomedical engineering, vol. 34, no. 2, p. e2914, 2018.
[18] M. Wang, Z. Cang, and G.-W. Wei, “A topology-based network tree for the prediction of protein–protein binding affinity changes following mutation,” Nature Machine Intelligence, vol. 2, no. 2, pp. 116–123, 2020.
[19] J. Chen, R. Wang, M. Wang, and G.-W. Wei, “Mutations strengthened SARS-CoV-2 infectivity,” Journal of molecular biology, vol. 432, no. 19, pp. 5212–5226, 2020.
[20] D. D. Nguyen, Z. Cang, K. Wu, M. Wang, Y. Cao, and G.-W. Wei, “Mathematical deep learning for pose and binding affinity prediction and ranking in D3R grand challenges,” Journal of computer-aided molecular design, vol. 33, pp. 71–82, 2019.
[21] D. D. Nguyen, K. Gao, M. Wang, and G.-W. Wei, “MathDL: mathematical deep learning for d3r grand challenge 4,” Journal of computer-aided molecular design, vol. 34, pp. 131– 147, 2020.
[22] D. D. Nguyen, Z. Cang, and G.-W. Wei, “A review of mathematical representations of biomolecular data,” Physical Chemistry Chemical Physics, vol. 22, no. 8, pp. 4343–4367, 2020.
[23] R. Wang, D. D. Nguyen, and G.-W. Wei, “Persistent spectral graph,” arXiv preprint arXiv:1912.04135, 2019.
[24] J. Chen, R. Zhao, Y. Tong, and G.-W. Wei, “Evolutionary de Rham-Hodge method,” arXiv preprint arXiv:1912.12388, 2019.
[25] R. Wang, D. D. Nguyen, and G.-W. Wei, “Persistent spectral graph,” International Journal for Numerical Methods in Biomedical Engineering, p. e3376, 2020.
[26] Z. Meng and K. Xia, “Persistent spectral–based machine learning (PerSpect ML) for protein-ligand binding affinity prediction,” Science Advances, vol. 7, no. 19, p. eabc5329, 2021.
[27] J. Wee and K. Xia, “Persistent spectral based ensemble learning (PerSpect-EL) for protein–protein binding affinity prediction,” Briefings in Bioinformatics, p. bbac024, 2022.
[28] J. Bi, J. Wee, X. Liu, C. Qu, G. Wang, and K. Xia, “Multiscale Topological Indices for the Quantitative Prediction of SARS CoV-2 Binding Affinity Change upon Mutations,” Journal of Chemical Information and Modeling, vol. 63, no. 13, pp. 4216–4227, 2023.
[29] J. Chen, Y. Qiu, R. Wang, and G.-W. Wei, “Persistent Laplacian projected Omicron BA.4 and BA.5 to become new dominating variants,” Computers in Biology and Medicine, vol. 151, p. 106262, 2022.
[30] R. Rao, N. Bhattacharya, N. Thomas, Y. Duan, P. Chen, J. Canny, P. Abbeel, and Y. Song, “Evaluating protein transfer learning with TAPE,” Advances in neural information processing systems, vol. 32, 2019.
[31] T. Bepler and B. Berger, “Learning protein sequence embeddings using information from structure,” in International Conference on Learning Representations, 2018.
[32] E. C. Alley, G. Khimulya, S. Biswas, M. AlQuraishi, and G. M. Church, “Unified rational protein engineering with sequence-based deep representation learning,” Nature methods, vol. 16, no. 12, pp. 1315–1322, 2019.
[33] A. Rives, J. Meier, T. Sercu, S. Goyal, Z. Lin, J. Liu, D. Guo, M. Ott, C. L. Zitnick, J. Ma, et al., “Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences,” Proceedings of the National Academy of Sciences, vol. 118, no. 15, p. e2016239118, 2021.
[34] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” Advances in Neural Information Processing Systems, vol. 30, 2017.
[35] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding,” in North American Chapter of the Association for Computational Linguistics, 2019.
[36] J. Meier, R. Rao, R. Verkuil, J. Liu, T. Sercu, and A. Rives, “Language models enable zero-shot prediction of the effects of mutations on protein function,” in Advances in Neural Information Processing Systems (M. Ranzato, A. Beygelzimer, Y. Dauphin, P. Liang, and J. W. Vaughan, eds.), vol. 34, pp. 29287–29303, Curran Associates, Inc., 2021.
[37] J. Jumper, R. Evans, A. Pritzel, T. Green, M. Figurnov, O. Ronneberger, K. Tunyasuvunakool, R. Bates, A. Zˇ´ıdek, A. Potapenko, et al., “Highly accurate protein structure prediction with alphafold,” Nature, vol. 596, no. 7873, pp. 583–589, 2021.
[38] J. Z. Xiang and B. Honig, “Jackal: A protein structure modeling package,” Columbia University and Howard Hughes Medical Institute, New York, 2002.
[39] R. M. Rao, J. Liu, R. Verkuil, J. Meier, J. Canny, P. Abbeel, T. Sercu, and A. Rives, “MSA Transformer,” in International Conference on Machine Learning, pp. 8844–8856, PMLR, 2021.
[40] P. Baldi, S. Brunak, Y. Chauvin, C. A. Andersen, and H. Nielsen, “Assessing the accuracy of prediction algorithms for classification: An Overview,” Bioinformatics, vol. 16, no. 5, pp. 412–424, 2000.
[41] E. D. Levy, “A simple definition of structural regions in proteins and its use in analyzing interface evolution,” Journal of molecular biology, vol. 403, no. 4, pp. 660–670, 2010.
[42] D. S. Goodsell, L. Autin, and A. J. Olson, “Illustrate: Software for biomolecular illustration,” Structure, vol. 27, no. 11, pp. 1716–1720, 2019.
[43] Y. Hozumi, K. A. Tanemura, and G.-W. Wei, “Preprocessing of Single Cell RNA Sequencing Data Using Correlated Clustering and Projection,” Journal of Chemical Information and Modeling, 2023.
[44] J. R. Munkres, Elements of algebraic topology. CRC Press, 2018.
[45] A. J. Zomorodian, Topology for computing, vol. 16. Cambridge university press, 2005.
[46] H. Edelsbrunner and J. Harer, Computational topology: an introduction. American Mathematical Soc., 2010.
[47] K. Mischaikow and V. Nanda, “Morse theory for filtrations and efficient computation of persistent homology,” Discrete and Computational Geometry, vol. 50, no. 2, pp. 330–353, 2013.
[48] D. Horak and J. Jost, “Spectra of combinatorial Laplace operators on simplicial complexes,” Advances in Mathematics, vol. 244, pp. 303–336, 2013.
[49] B. Eckmann, “Harmonische funktionen und randwertaufgaben in einem komplex,” Commentarii Mathematici Helvetici, vol. 17, no. 1, pp. 240–255, 1944.
[50] A. Zomorodian, “Topological data analysis,” Advances in applied and computational topology, vol. 70, pp. 1–39, 2012.
[51] T. G. Project, GUDHI User and Reference Manual. GUDHI Editorial Board, 2015.
[52] M. Mirdita, K. Schutze, Y. Moriwaki, L. Heo, S. Ovchinnikov, and M. Steinegger, “Colab- ¨ Fold: Making protein folding accessible to all,” Nature methods, vol. 19, no. 6, pp. 679– 682, 2022.
[53] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay, “Scikit-learn: Machine Learning in Python,” Journal of Machine Learning Research, vol. 12, pp. 2825–2830, 2011.
[54] T. J. Dolinsky, J. E. Nielsen, J. A. McCammon, and N. A. Baker, “PDB2PQR: An automated pipeline for the setup of Poisson–Boltzmann electrostatics calculations,” Nucleic acids research, vol. 32, no. suppl 2, pp. W665–W667, 2004.
[55] B. Liu, B. Wang, R. Zhao, Y. Tong, and G.-W. Wei, “ESES: Software for eulerian solvent excluded surface,” 2017.
[56] D. Chen, Z. Chen, C. Chen, W. Geng, and G.-W. Wei, “MIBPB: A software package for electrostatic analysis,” Journal of computational chemistry, vol. 32, no. 4, pp. 756–770, 2011.
[57] H. Li, A. D. Robertson, and J. H. Jensen, “Very fast empirical prediction and rationalization of protein pKa values,” Proteins: Structure, Function, and Bioinformatics, vol. 61, no. 4, pp. 704–721, 2005.
[58] M. Johnson, I. Zaretskaya, Y. Raytselis, Y. Merezhuk, S. McGinnis, and T. L. Madden, “NCBI BLAST: A better web interface,” Nucleic acids research, vol. 36, no. suppl 2, pp. W5–W9, 2008.
[59] R. Heffernan, K. Paliwal, J. Lyons, A. Dehzangi, A. Sharma, J. Wang, A. Sattar, Y. Yang, and Y. Zhou, “Improving prediction of secondary structure, local backbone angles and solvent accessible surface area of proteins by iterative deep learning,” Scientific reports, vol. 5, no. 1, p. 11476, 2015.
[60] C. Maria, J.-D. Boissonnat, M. Glisse, and M. Yvinec, “The GUDHI library: Simplicial complexes and persistent homology,” in Mathematical Software–ICMS 2014: 4th International Congress, Seoul, South Korea, August 5-9, 2014. Proceedings 4, pp. 167–174, Springer, 2014.
This paper is available on arxiv under CC 4.0 license.
Lead image by Cytonn Photography on Unsplash