TY - JOUR
T1 - When global and local molecular descriptors are more than the sum of its parts
T2 - Simple, But Not Simpler?
AU - Martínez-López, Yoan
AU - Marrero-Ponce, Yovani
AU - Barigye, Stephen J.
AU - Teran, Enrique
AU - Martínez-Santiago, Oscar
AU - Zambrano, Cesar H.
AU - Torres, F. Javier
N1 - Funding Information:
This work was partially supported from the USFQ (Project ID13525 “Chancellor Grant 2018”).
Funding Information:
Yoan Martínez-López thanks the program International Investigator Invited for a postdoctoral fellowship to work at USFQ in 2019. Yovani Marrero-Ponce acknowledges the support from USFQ “Chancellor Grant 2018 (Project ID13525).”
Publisher Copyright:
© 2019, Springer Nature Switzerland AG.
PY - 2020/11/1
Y1 - 2020/11/1
N2 - Abstract: In this report, we introduce a set of aggregation operators (AOs) to calculate global and local (group and atom type) molecular descriptors (MDs) as a generalization of the classical approach of molecular encoding using the sum of the atomic (or fragment) contributions. These AOs are implemented in a new and free software denominated MD-LOVIs (http://tomocomd.com/md-lovis), which allows for the calculation of MDs from atomic weights vector and LOVIs (local vertex invariants). This software was developed in Java programming language and employed the Chemical Development Kit (CDK) library for handling chemical structures and the calculation of atomic weights. An analysis of the complexities of the algorithms presented herein demonstrates that these aspects were efficiently implemented. The calculation speed experiments show that the MD-LOVIs software has satisfactory behavior when compared to software such as Padel, CDKDescriptor, DRAGON and Bluecal software. Shannon’s entropy (SE)-based variability studies demonstrate that MD-LOVIs yields indices with greater information content when compared to those of popular academic and commercial software. A principal component analysis reveals that our approach captures chemical information orthogonal to that codified by the DRAGON, Padel and Mold2 software, as a result of the several generalizations in MD-LOVIs not used in other programs. Lastly, three QSARs were built using multiple linear regression with genetic algorithms, and the statistical parameters of these models demonstrate that the MD-LOVIs indices obtained with AOs yield better performance than those obtained when the summation operator is used exclusively. Moreover, it is also revealed that the MD-LOVIs indices yield models with comparable to superior performance when compared to other QSAR methodologies reported in the literature, despite their simplicity. The studies performed herein collectively demonstrated that MD-LOVIs software generates indices as simple as possible, but not simpler and that use of AOs enhances the diversity of the chemical information codified, which consequently improves the performance of traditional MDs. Graphic abstract: [Figure not available: see fulltext.]
AB - Abstract: In this report, we introduce a set of aggregation operators (AOs) to calculate global and local (group and atom type) molecular descriptors (MDs) as a generalization of the classical approach of molecular encoding using the sum of the atomic (or fragment) contributions. These AOs are implemented in a new and free software denominated MD-LOVIs (http://tomocomd.com/md-lovis), which allows for the calculation of MDs from atomic weights vector and LOVIs (local vertex invariants). This software was developed in Java programming language and employed the Chemical Development Kit (CDK) library for handling chemical structures and the calculation of atomic weights. An analysis of the complexities of the algorithms presented herein demonstrates that these aspects were efficiently implemented. The calculation speed experiments show that the MD-LOVIs software has satisfactory behavior when compared to software such as Padel, CDKDescriptor, DRAGON and Bluecal software. Shannon’s entropy (SE)-based variability studies demonstrate that MD-LOVIs yields indices with greater information content when compared to those of popular academic and commercial software. A principal component analysis reveals that our approach captures chemical information orthogonal to that codified by the DRAGON, Padel and Mold2 software, as a result of the several generalizations in MD-LOVIs not used in other programs. Lastly, three QSARs were built using multiple linear regression with genetic algorithms, and the statistical parameters of these models demonstrate that the MD-LOVIs indices obtained with AOs yield better performance than those obtained when the summation operator is used exclusively. Moreover, it is also revealed that the MD-LOVIs indices yield models with comparable to superior performance when compared to other QSAR methodologies reported in the literature, despite their simplicity. The studies performed herein collectively demonstrated that MD-LOVIs software generates indices as simple as possible, but not simpler and that use of AOs enhances the diversity of the chemical information codified, which consequently improves the performance of traditional MDs. Graphic abstract: [Figure not available: see fulltext.]
UR - http://www.scopus.com/inward/record.url?scp=85075806754&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85075806754&partnerID=8YFLogxK
U2 - 10.1007/s11030-019-10002-3
DO - 10.1007/s11030-019-10002-3
M3 - Research Article
C2 - 31659696
AN - SCOPUS:85075806754
SN - 1381-1991
VL - 24
SP - 913
EP - 932
JO - Molecular Diversity
JF - Molecular Diversity
IS - 4
ER -