DeepP450: Predicting Human P450 Activities of Small Molecules by Integrating Pretrained Protein Language Model and Molecular Representation

ABSTRACT: Cytochrome P450 enzymes (CYPs) play a crucial role in Phase I drug metabolism in the human body, and CYP activity toward compounds can significantly affect druggability, making early prediction of CYP activity and substrate identification essential for therapeutic development. Here, we established a deep learning model for assessing potential CYP substrates, DeepP450, by fine-tuning protein and molecule pretrained models through feature integration with cross-attention and self-attention layers. This model exhibited high prediction accuracy (0.92) on the test set, with area under the receiver operating characteristic curve (AUROC) values ranging from 0.89 to 0.98 in substrate/nonsubstrate predictions across the nine major human CYPs, surpassing current benchmarks for CYP activity prediction. Notably, DeepP450 uses only one model to predict substrates/nonsubstrates for any of the nine CYPs and exhibits certain generalizability on novel compounds and different categories of human CYPs, which could greatly facilitate early stage drug design by avoiding CYP-reactive compounds.

For detail: