Share
Export Citation
Integrating Machine Learning and Molecular Docking for Natural Compounds Discovery
Rasyak M.R.
2025 5th International Conference on Intelligent Cybernetics Technology and Applications Icicyta 2025
Abstract
Breast cancer is recognized as a leading cause of death among women worldwide and shows a high incidence rate in Indonesia. This study applied a computational pipeline integrating machine learning and molecular docking to identify bioactive compounds from three traditional West Sulawesi medicinal plants-Strobilanthes crispa, Basella alba, and Ficus septica-with potential inhibitory activity against the epidermal growth factor receptor (EGFR), a key therapeutic target in breast cancer. A total of 236 phytochemicals were collected and converted into 881 PubChem substructure fingerprints. Using a Support Vector Classifier (SVC) trained on balanced activedecoy datasets and optimized through grid search of kernel types, regularization (C), and gamma parameters. The optimized model achieved an AUC of 0.98 and accuracy of 0.99 for EGFR inhibition prediction. From this analysis, 36 valid compounds were identified, with 34 (94%) showing strong binding affinity EGFR, a key therapeutic target in breast cancer. Among them, chlorogenic acid (-10.93 kcal/mol) and kaempferitrin (-10.81 kcal/mol) exhibited stronger theoretical binding affinity than the reference inhibitor Tak-285 (-10.17 kcal/mol). ADME-toxicity profiling confirmed drug-likeness and safety potential of selected compounds. Overall, this integrative approach demonstrates the capability of combining machine learning prediction and structure-based modeling to accelerate the discovery of natural compounds for targeted breast cancer therapy.