TY - CHAP
T1 - Escherichia coli: Analysis of Features for Protein Localization Classification Employing Fusion Data
T2 - 5th IEEE Colombian Conference on Applications of Computational Intelligence, ColCACI 2022
AU - Orjuela-Cañon, Alvaro David
AU - Rodriguez, Diana C.
AU - Perdomo, Oscar
N1 - Funding Information:
Acknowledgment. Authors acknowledge the support of the Universidad del Rosario for funding this project. In addition, the contribution of research incubator teams Semillero en Inteligencia Artificial en Salud: Semill-IAS and Semillero SyNERGIA.
Funding Information:
Authors acknowledge the support of the Universidad del Rosario for funding this project. In addition, the contribution of research incubator teams Semillero en Inteligencia Artificial en Salud: Semill-IAS and Semillero SyNERGIA.
Publisher Copyright:
© 2023, The Author(s), under exclusive license to Springer Nature Switzerland AG.
PY - 2023/1/1
Y1 - 2023/1/1
N2 - Machine learning models can be used for relevance of features in classification systems. The interest in protein analysis based on biomolecular information has rapidly grown. In this case a comparison of two sources of this information was employed to determine protein localization in Escherichia coli cells. Models as support vector machines, artificial neural networks and random forest were compared for the prediction of protein localization. The sources of data used to train the models were the information from targeting signal and protein sequences, for determining the localization sites of the protein. A third scenario with a fusion of both sources of data was employed. Four classes were established according to the subcellular localization of the protein: cytoplasm, periplasmatic space, outer and inner membranes. Results reached values between 77% and 92% in terms of balanced accuracy. The models with better performance were based on random forest and support vector machines. In terms of features, the first source, where targeting signal was employed, was the one with best performance associated to relevance for the classification.
AB - Machine learning models can be used for relevance of features in classification systems. The interest in protein analysis based on biomolecular information has rapidly grown. In this case a comparison of two sources of this information was employed to determine protein localization in Escherichia coli cells. Models as support vector machines, artificial neural networks and random forest were compared for the prediction of protein localization. The sources of data used to train the models were the information from targeting signal and protein sequences, for determining the localization sites of the protein. A third scenario with a fusion of both sources of data was employed. Four classes were established according to the subcellular localization of the protein: cytoplasm, periplasmatic space, outer and inner membranes. Results reached values between 77% and 92% in terms of balanced accuracy. The models with better performance were based on random forest and support vector machines. In terms of features, the first source, where targeting signal was employed, was the one with best performance associated to relevance for the classification.
UR - http://www.scopus.com/inward/record.url?scp=85152559412&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85152559412&partnerID=8YFLogxK
U2 - 10.1007/978-3-031-29783-0_3
DO - 10.1007/978-3-031-29783-0_3
M3 - Chapter
AN - SCOPUS:85152559412
SN - 9783031297823
T3 - Communications in Computer and Information Science
SP - 31
EP - 43
BT - Applications of Computational Intelligence - 5th IEEE Colombian Conference, ColCACI 2022, Revised Selected Papers
A2 - Orjuela-Cañón, Alvaro David
A2 - Lopez, Jesus
A2 - Arias-Londoño, Julian David
A2 - Figueroa-García, Juan Carlos
PB - Springer Science and Business Media Deutschland GmbH
Y2 - 27 July 2022 through 29 July 2022
ER -