Enhancing Protein Solubility via Glycosylation: From Chemical Synthesis to Machine Learning Predictions

ABSTRACT: Glycosylation is a valuable tool for modulating protein solubility; however, the lack of reliable research strategies has impeded efficient progress in understanding and applying this modification. This study aimed to bridge this gap by investigating the solubility of a model glycoprotein molecule, the carbohydrate-binding module (CBM), through a two-stage process. In the first stage, an approach involving chemical synthesis, comparative analysis, and molecular dynamics simulations of a library of glycoforms was employed to elucidate the effect of different glycosylation patterns on solubility and the key factors responsible for the effect. In the second stage, a predictive mathematical formula, innovatively harnessing machine learning algorithms, was derived to relate solubility to the identified key factors and accurately predict the solubility of the newly designed glycoforms. Demonstrating feasibility and effectiveness, this two-stage approach offers a valuable strategy for advancing glycosylation research, especially for the discovery of glycoforms with increased solubility.