Dataset |
||||||||||||||||||||||||||
For the development of the QSAR models, we extracted drug data from GDSC website and merged it with the descriptors generated from PaDEL software[1]. Finally, we obtained 12 files with descriptors and logIC50 (µM) values for each corresponding cell line.
| ||||||||||||||||||||||||||
Descriptors |
||||||||||||||||||||||||||
Descriptors were reduced using WEKA tool [2,3] and then F-stepping was applied using Sequential Feature Selection in Python from mlxtend library[4]. | ||||||||||||||||||||||||||
Some commonly found descriptors across the 12 cell lines are listed below: |
Descriptors | Java Class |
---|---|
KRFP314 | KlekotaRothFingerprinter |
KRFPC314 | KlekotaRothFingerprintCount |
FP3 | Fingerprinter |
APC2D9_O_I | AtomPairs2DFingerprintCount |
GraphFP252 | GraphOnlyFingerprinter |
JGI10 | Mean topological charge index of order 10 |
KRFP3683 | KlekotaRothFingerprintCount |
KRFP803 | KlekotaRothFingerprintCount |
nC | Number of carbon atoms |
Algorithm |
We developed QSAR models for the 12 cell lines using AI/ML algorithms. We used the following algorithms for choosing the best QSAR models: ABOUT SVM: Support Vector Machine is a powerful machine learning algorithm that can be used for classification and regression tasks. It works by finding the optimal hyperplane that separates the data into different classes, using a kernel function to map the data into a higher-dimensional feature space. SVM is particularly useful when the data is not linearly separable and is less prone to overfitting compared to other algorithms. |
References |
1. Yap CW. PaDEL-descriptor: An open source software to calculate molecular descriptors and fingerprints. J Comput Chem 2011; 32:1466–1474;
|