Prediction of Drug-Target Interaction Using Random Forest in Coronavirus Disease 2019 Case
##plugins.themes.academic_pro.article.main##
Abstract
Coronavirus disease 2019 is an infectious disease that causes severe respiratory, digestive, and systemic infections that caused a pandemic in 2019. One of the focuses of the drug development process to fight the coronavirus disease 2019 is by carrying out drug repurposing. This study uses random forest with a feature-based chemogenomics approach on the drug-target interaction data of coronavirus disease 2019. The feature extraction process is carried out on compounds and protein using PubChem fingerprint and amino acid composition respectively. Feature selection using XGBoost is done to reduce the data dimension. The random undersampling process was also carried out to solve the problem of imbalanced data in the dataset. Using the cross-validation process, the random forest model produced an average accuracy value of 0.98, recall value of 0.92, precision value of 0.95, AUROC value of 0.95, and F1 score of 0.93. The random forest model also produced an accuracy value of 0.99, recall value of 0.93, the precision value of 0.94, AUROC value of 0.99, and F-measure of 0.94 when used to predict the original dataset (dataset without random undersampling process).
##plugins.themes.academic_pro.article.details##

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
References
- Malik YS, Shubhankar S, Bhat S, Sharun K, Dhama K, Dadar M, et al. (2020) Emerging novel Corona-virus (2019-nCoV) - Current scenario, evolution-ary perspective based on genome analysis and re-cent developments. Vet. Q. 40(1):68-76. https://doi.org/10.1080/01652176.2020.1727993
- Wu C, Liu Y, Yang Y, Zhang P, Zhong W, Wang Y, Wang Q, et al. (2020) Analysis of therapeutic tar-gets for SARS-CoV-2 and discovery of potential drugs by computational methods. Acta Pharm Sin B. 10(5):766-88. https://doi.org/10.1016/j.apsb.2020.02.008
- Yadav, M, Dhagat, S, Eswari, JS. Emerging strate-gies on in silico drug development against COVID-19: challenges and opportunities. Eur J Pharm Sci. 2020;155(1):1-15. https://doi.org/10.1016/j.ejps.2020.105522
- Huang, F, Zhang, C, Liu, Q, Zhao, Y, Zhang, Y, Qin, Y, et al. (2020) Identification of amitriptyline HCl, flavin adenine dinucleotide, azacitidine and calcit-riol as repurposing drugs for influenza A H5N1 vi-rus-induced lung injury. PLoS Pathog. 2020;16(3):e1008341. https://doi.org/10.1371/journal.ppat.1008341
- Sulistiawan, F, Kusuma, WA, Ramadhanti, N, Te-djo, A (2020) Drug-target interaction prediction in coronavirus disease 2019 case using deep semi-supervised learning model. The 2020 Internation-al Conference on Advanced Computer Science and Information Systems (ICACSIS 2020); 2020 Octo-ber 17-18. Indonesia: Universitas Indonesia https://doi.org/10.1109/ICACSIS51025.2020.9263241
- Mahmud SMH, Chen W, Meng H, Jahan H, Liu Y, Hasan SMM (2019) Prediction of drug-target in-teraction based on protein features using under-sampling and feature selection techniques with boosting. Anal Biochem 589, 113507.https://doi.org/10.1016/j.ab.2019.113507
- Spelmen VS, Porkodi R (2018). A Review on Han-dling Imbalanced Data. Proceedings of the 2018 International Conference on Current Trends to-wards Converging Technologies, ICCTCT 2018: 1-3 March 2018; Coimbatore, India. India: Institute of Electrical and Electronics Engineers. https://doi.org/10.1109/ICCTCT.2018.8551020
- Chu Y, Shan X, Chen T, Jiang M, Wang Y, Wang Q, Salahub DR, Xiong Y, Wei DQ (2020) DTI-MLCD: predicting drug-target interactions using multi-label learning with community detection method. Briefings in Bioinformatics 22(3), bbaa205. https://doi.org/10.1093/bib/bbaa205
- Gortari EF, García-Jacas CR, Martinez-Mayorga K, Medina-Franco JL. Database fingerprint (DFP): An approach to represent molecular databases. J Cheminform. 2017; 9(9):1-9. doi:10.1186/s13321-017-0195-1.
- Weininger D, Weininger A, Weininger JL (1989) SMILES. 2. Algorithm for generation of unique SMILES notation. J Chem Inf Comput Sci 29(2):97–101. https://doi.org/10.1021/ci00062a008
- Bhasin M, Raghava GPS (2004) Classification of nuclear receptors based on amino acid composi-tion and dipeptide composition. J Biol Chem 279(22): 23262- 23266. https://doi.org/10.1074/jbc.M401932200
- Kuleshov MV, Clarke D, Kropiwnicki E, Jagodnik KM, Bartal A, Evangelista JE, et al. (2020). The COVID-19 Gene and Drug Set Library. Res Sq. 2020;rs.3.rs-28582. https://doi.org/10.21203/rs.3.rs-28582/v1
- Kermali M, Khalsa RK, Pillai K, Ismail Z, Harky A (2020) The role of biomarkers in diagnosis of COVID-19 - A systematic review. Life Sci 254(117788):1-12. https://doi.org/10.1016/j.lfs.2020.117788
- Zhang L, Guo H (2020) Biomarkers of COVID-19 and technologies to combat SARS-CoV-2. Adv Bi-omark Sci Technol 2:1-23. https://doi.org/10.1016/j.abst.2020.08.001
- Chen X, Yin YH, Zhang MY, Liu JY, Li R, Qu YQ (2020) Investigating the mechanism of ShuFeng JieDu capsule for the treatment of novel Corona-virus pneumonia (COVID-19) based on network pharmacology. Int J Med Sci 17(16):2511-30. https://doi.org/10.7150/ijms.46378.
- Coleman CM, Sisk JM, Mingo RM, Nelson EA, White JM, Frieman MB (2016) Abelson kinase inhibitors are potent inhibitors of severe acute respiratory syndrome coronavirus and middle east respirato-ry syndrome coronavirus fusion. J Virol 90(19):8924–33. https://doi.org/10.1128/JVI.01429-16