Escherichia coli: Analysis of Features for Protein Localization Classification Employing Fusion Data: Analysis of Features for Protein Localization Classification Employing Fusion Data

Alvaro David Orjuela-Cañon; Diana C. Rodriguez; Oscar Perdomo

doi:10.1007/978-3-031-29783-0_3

Escherichia coli: Analysis of Features for Protein Localization Classification Employing Fusion Data: Analysis of Features for Protein Localization Classification Employing Fusion Data

Alvaro David Orjuela-Cañon, Diana C. Rodriguez, Oscar Perdomo

Producción científica: Capítulo en Libro/Reporte › Capítulo

Resumen

Machine learning models can be used for relevance of features in classification systems. The interest in protein analysis based on biomolecular information has rapidly grown. In this case a comparison of two sources of this information was employed to determine protein localization in Escherichia coli cells. Models as support vector machines, artificial neural networks and random forest were compared for the prediction of protein localization. The sources of data used to train the models were the information from targeting signal and protein sequences, for determining the localization sites of the protein. A third scenario with a fusion of both sources of data was employed. Four classes were established according to the subcellular localization of the protein: cytoplasm, periplasmatic space, outer and inner membranes. Results reached values between 77% and 92% in terms of balanced accuracy. The models with better performance were based on random forest and support vector machines. In terms of features, the first source, where targeting signal was employed, was the one with best performance associated to relevance for the classification.

Idioma original	Inglés estadounidense
Título de la publicación alojada	Applications of Computational Intelligence - 5th IEEE Colombian Conference, ColCACI 2022, Revised Selected Papers
Editores	Alvaro David Orjuela-Cañón, Jesus Lopez, Julian David Arias-Londoño, Juan Carlos Figueroa-García
Editorial	Springer Science and Business Media Deutschland GmbH
Páginas	31-43
Número de páginas	13
ISBN (versión impresa)	9783031297823
DOI	https://doi.org/10.1007/978-3-031-29783-0_3
Estado	Publicada - ene. 1 2023
Evento	5th IEEE Colombian Conference on Applications of Computational Intelligence, ColCACI 2022 - Cali, Colombia Duración: jul. 27 2022 → jul. 29 2022

Serie de la publicación

Nombre	Communications in Computer and Information Science
Volumen	1746 CCIS

Conferencia

Conferencia	5th IEEE Colombian Conference on Applications of Computational Intelligence, ColCACI 2022
País/Territorio	Colombia
Ciudad	Cali
Período	7/27/22 → 7/29/22

Áreas temáticas de ASJC Scopus

Ciencia de la Computación General
Matemáticas General

ODS de las Naciones Unidas

Este resultado contribuye a los siguientes Objetivos de Desarrollo Sostenible

Acceder al documento

10.1007/978-3-031-29783-0_3

Otros archivos y enlaces

Citar esto

Orjuela-Cañon, A. D., Rodriguez, D. C., & Perdomo, O. (2023). Escherichia coli: Analysis of Features for Protein Localization Classification Employing Fusion Data: Analysis of Features for Protein Localization Classification Employing Fusion Data. En A. D. Orjuela-Cañón, J. Lopez, J. D. Arias-Londoño, & J. C. Figueroa-García (Eds.), Applications of Computational Intelligence - 5th IEEE Colombian Conference, ColCACI 2022, Revised Selected Papers (pp. 31-43). (Communications in Computer and Information Science; Vol. 1746 CCIS). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-031-29783-0_3

Orjuela-Cañon, Alvaro David ; Rodriguez, Diana C. ; Perdomo, Oscar. / Escherichia coli: Analysis of Features for Protein Localization Classification Employing Fusion Data : Analysis of Features for Protein Localization Classification Employing Fusion Data. Applications of Computational Intelligence - 5th IEEE Colombian Conference, ColCACI 2022, Revised Selected Papers. editor / Alvaro David Orjuela-Cañón ; Jesus Lopez ; Julian David Arias-Londoño ; Juan Carlos Figueroa-García. Springer Science and Business Media Deutschland GmbH, 2023. pp. 31-43 (Communications in Computer and Information Science).

@inbook{9505825243eb409fa7def377bb453ca9,

title = "Escherichia coli: Analysis of Features for Protein Localization Classification Employing Fusion Data: Analysis of Features for Protein Localization Classification Employing Fusion Data",

abstract = "Machine learning models can be used for relevance of features in classification systems. The interest in protein analysis based on biomolecular information has rapidly grown. In this case a comparison of two sources of this information was employed to determine protein localization in Escherichia coli cells. Models as support vector machines, artificial neural networks and random forest were compared for the prediction of protein localization. The sources of data used to train the models were the information from targeting signal and protein sequences, for determining the localization sites of the protein. A third scenario with a fusion of both sources of data was employed. Four classes were established according to the subcellular localization of the protein: cytoplasm, periplasmatic space, outer and inner membranes. Results reached values between 77% and 92% in terms of balanced accuracy. The models with better performance were based on random forest and support vector machines. In terms of features, the first source, where targeting signal was employed, was the one with best performance associated to relevance for the classification.",

author = "Orjuela-Ca{\~n}on, {Alvaro David} and Rodriguez, {Diana C.} and Oscar Perdomo",

note = "Funding Information: Acknowledgment. Authors acknowledge the support of the Universidad del Rosario for funding this project. In addition, the contribution of research incubator teams Semillero en Inteligencia Artificial en Salud: Semill-IAS and Semillero SyNERGIA. Funding Information: Authors acknowledge the support of the Universidad del Rosario for funding this project. In addition, the contribution of research incubator teams Semillero en Inteligencia Artificial en Salud: Semill-IAS and Semillero SyNERGIA. Publisher Copyright: {\textcopyright} 2023, The Author(s), under exclusive license to Springer Nature Switzerland AG.; 5th IEEE Colombian Conference on Applications of Computational Intelligence, ColCACI 2022 ; Conference date: 27-07-2022 Through 29-07-2022",

year = "2023",

month = jan,

day = "1",

doi = "10.1007/978-3-031-29783-0_3",

language = "English (US)",

isbn = "9783031297823",

series = "Communications in Computer and Information Science",

publisher = "Springer Science and Business Media Deutschland GmbH",

pages = "31--43",

editor = "Orjuela-Ca{\~n}{\'o}n, {Alvaro David} and Jesus Lopez and Arias-Londo{\~n}o, {Julian David} and Figueroa-Garc{\'i}a, {Juan Carlos}",

booktitle = "Applications of Computational Intelligence - 5th IEEE Colombian Conference, ColCACI 2022, Revised Selected Papers",

address = "Germany",

}

Orjuela-Cañon, AD, Rodriguez, DC & Perdomo, O 2023, Escherichia coli: Analysis of Features for Protein Localization Classification Employing Fusion Data: Analysis of Features for Protein Localization Classification Employing Fusion Data. En AD Orjuela-Cañón, J Lopez, JD Arias-Londoño & JC Figueroa-García (eds.), Applications of Computational Intelligence - 5th IEEE Colombian Conference, ColCACI 2022, Revised Selected Papers. Communications in Computer and Information Science, vol. 1746 CCIS, Springer Science and Business Media Deutschland GmbH, pp. 31-43, 5th IEEE Colombian Conference on Applications of Computational Intelligence, ColCACI 2022, Cali, Colombia, 7/27/22. https://doi.org/10.1007/978-3-031-29783-0_3

Escherichia coli: Analysis of Features for Protein Localization Classification Employing Fusion Data: Analysis of Features for Protein Localization Classification Employing Fusion Data. / Orjuela-Cañon, Alvaro David; Rodriguez, Diana C.; Perdomo, Oscar.
Applications of Computational Intelligence - 5th IEEE Colombian Conference, ColCACI 2022, Revised Selected Papers. ed. / Alvaro David Orjuela-Cañón; Jesus Lopez; Julian David Arias-Londoño; Juan Carlos Figueroa-García. Springer Science and Business Media Deutschland GmbH, 2023. p. 31-43 (Communications in Computer and Information Science; Vol. 1746 CCIS).

Producción científica: Capítulo en Libro/Reporte › Capítulo

TY - CHAP

T1 - Escherichia coli: Analysis of Features for Protein Localization Classification Employing Fusion Data

T2 - 5th IEEE Colombian Conference on Applications of Computational Intelligence, ColCACI 2022

AU - Orjuela-Cañon, Alvaro David

AU - Rodriguez, Diana C.

AU - Perdomo, Oscar

N1 - Funding Information: Acknowledgment. Authors acknowledge the support of the Universidad del Rosario for funding this project. In addition, the contribution of research incubator teams Semillero en Inteligencia Artificial en Salud: Semill-IAS and Semillero SyNERGIA. Funding Information: Authors acknowledge the support of the Universidad del Rosario for funding this project. In addition, the contribution of research incubator teams Semillero en Inteligencia Artificial en Salud: Semill-IAS and Semillero SyNERGIA. Publisher Copyright: © 2023, The Author(s), under exclusive license to Springer Nature Switzerland AG.

PY - 2023/1/1

Y1 - 2023/1/1

N2 - Machine learning models can be used for relevance of features in classification systems. The interest in protein analysis based on biomolecular information has rapidly grown. In this case a comparison of two sources of this information was employed to determine protein localization in Escherichia coli cells. Models as support vector machines, artificial neural networks and random forest were compared for the prediction of protein localization. The sources of data used to train the models were the information from targeting signal and protein sequences, for determining the localization sites of the protein. A third scenario with a fusion of both sources of data was employed. Four classes were established according to the subcellular localization of the protein: cytoplasm, periplasmatic space, outer and inner membranes. Results reached values between 77% and 92% in terms of balanced accuracy. The models with better performance were based on random forest and support vector machines. In terms of features, the first source, where targeting signal was employed, was the one with best performance associated to relevance for the classification.

AB - Machine learning models can be used for relevance of features in classification systems. The interest in protein analysis based on biomolecular information has rapidly grown. In this case a comparison of two sources of this information was employed to determine protein localization in Escherichia coli cells. Models as support vector machines, artificial neural networks and random forest were compared for the prediction of protein localization. The sources of data used to train the models were the information from targeting signal and protein sequences, for determining the localization sites of the protein. A third scenario with a fusion of both sources of data was employed. Four classes were established according to the subcellular localization of the protein: cytoplasm, periplasmatic space, outer and inner membranes. Results reached values between 77% and 92% in terms of balanced accuracy. The models with better performance were based on random forest and support vector machines. In terms of features, the first source, where targeting signal was employed, was the one with best performance associated to relevance for the classification.

UR - http://www.scopus.com/inward/record.url?scp=85152559412&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85152559412&partnerID=8YFLogxK

U2 - 10.1007/978-3-031-29783-0_3

DO - 10.1007/978-3-031-29783-0_3

M3 - Chapter

AN - SCOPUS:85152559412

SN - 9783031297823

T3 - Communications in Computer and Information Science

SP - 31

EP - 43

BT - Applications of Computational Intelligence - 5th IEEE Colombian Conference, ColCACI 2022, Revised Selected Papers

A2 - Orjuela-Cañón, Alvaro David

A2 - Lopez, Jesus

A2 - Arias-Londoño, Julian David

A2 - Figueroa-García, Juan Carlos

PB - Springer Science and Business Media Deutschland GmbH

Y2 - 27 July 2022 through 29 July 2022

ER -

Orjuela-Cañon AD, Rodriguez DC, Perdomo O. Escherichia coli: Analysis of Features for Protein Localization Classification Employing Fusion Data: Analysis of Features for Protein Localization Classification Employing Fusion Data. En Orjuela-Cañón AD, Lopez J, Arias-Londoño JD, Figueroa-García JC, editores, Applications of Computational Intelligence - 5th IEEE Colombian Conference, ColCACI 2022, Revised Selected Papers. Springer Science and Business Media Deutschland GmbH. 2023. p. 31-43. (Communications in Computer and Information Science). doi: 10.1007/978-3-031-29783-0_3

Escherichia coli: Analysis of Features for Protein Localization Classification Employing Fusion Data: Analysis of Features for Protein Localization Classification Employing Fusion Data

Resumen

Serie de la publicación

Conferencia

Áreas temáticas de ASJC Scopus

ODS de las Naciones Unidas

Acceder al documento

Otros archivos y enlaces

Huella

Citar esto